Module gamslib.objectcsv
Tools for managing object and datastream metadata in CSV files for GAMS projects.
This package provides utilities to read, write, validate, and manipulate metadata stored in
object.csv and datastreams.csv files, which accompany GAMS bags but are not part of the bag itself.
Main components:
- ObjectCSVManager: Manages metadata for a single object and its datastreams. Handles reading, writing, validating, merging, and updating CSV files.
- ObjectCollection: Aggregates metadata from multiple objects into a single CSV file and distributes updates back to individual object directories. Useful for batch editing and synchronization.
- dublincore_csv: Functions for accessing and processing Dublin Core metadata from 'DC.xml' files, including language preference utilities.
- create_csv: Initializes CSV files for all objects in a project.
- manage_csv: Collects metadata from all objects into a single CSV for efficient editing, and updates individual object directories from the aggregated data.
- xlsx: Converts CSV files to XLSX format and vice versa, enabling spreadsheet-based editing and avoiding encoding issues common with CSV imports/exports.
Public API:
DSData
ObjectCSVManager
ObjectCollection
ObjectData
collect_csv_data()
create_csv_files()
csv_to_xlsx()
split_from_csv()
split_from_xlsx()
xlsx_to_csv()
These classes and functions are imported into the package namespace for direct use.
Sub-modules
gamslib.objectcsv.create_csv-
Create object.csv and datastreams.csv files for GAMS objects …
gamslib.objectcsv.defaultvalues-
Default values and namespaces for datastream metadata in GAMS projects …
gamslib.objectcsv.dsdata-
Datastream metadata model for GAMS object CSV files …
gamslib.objectcsv.dublincore-
Dublin Core metadata access for GAMS objects …
gamslib.objectcsv.exceptions-
Custom exceptions for the GAMSlib object CSV module …
gamslib.objectcsv.manage_csv-
Functions to collect and update object and datastream CSV files for GAMS projects …
gamslib.objectcsv.objectcollection-
Aggregate and manage CSV/XLSX metadata for multiple GAMS objects …
gamslib.objectcsv.objectcsvmanager-
Manage CSV metadata for GAMS objects and their datastreams …
gamslib.objectcsv.objectdata-
CSV data model for a single GAMS object …
gamslib.objectcsv.utils-
Utility functions for the objectcsv module …
gamslib.objectcsv.xlsx-
Utilities to convert object and datastream CSV files to XLSX format and back …
Functions
def collect_csv_data(object_root_dir: pathlib._local.Path,
object_csv_path: pathlib._local.Path | None = None,
datastream_csv_path: pathlib._local.Path | None = None) ‑> ObjectCollection-
Expand source code
def collect_csv_data( object_root_dir: Path, object_csv_path: Path | None = None, datastream_csv_path: Path | None = None, ) -> ObjectCollection: """ Collect metadata from all object folders below object_root_dir and save to combined CSV files. Args: object_root_dir (Path): Root directory containing all object folders. object_csv_path (Path | None): Path to save combined object metadata CSV. Defaults to 'object.csv' in CWD. datastream_csv_path (Path | None): Path to save combined datastream metadata CSV. Defaults to 'datastreams.csv' in CWD. Returns: ObjectCollection: Collection containing all object and datastream metadata. Notes: - Reads all object.csv and datastreams.csv files below object_root_dir. - Saves aggregated metadata to the specified CSV files. """ object_csv_path = object_csv_path or Path.cwd() / objectcollection.ALL_OBJECTS_CSV datastream_csv_path = ( datastream_csv_path or Path.cwd() / objectcollection.ALL_DATASTREAMS_CSV ) collector = ObjectCollection() collector.collect_from_objects(object_root_dir) collector.save_to_csv(object_csv_path, datastream_csv_path) return collectorCollect metadata from all object folders below object_root_dir and save to combined CSV files.
Args
object_root_dir:Path- Root directory containing all object folders.
object_csv_path:Path | None- Path to save combined object metadata CSV. Defaults to 'object.csv' in CWD.
datastream_csv_path:Path | None- Path to save combined datastream metadata CSV. Defaults to 'datastreams.csv' in CWD.
Returns
ObjectCollection- Collection containing all object and datastream metadata.
Notes
- Reads all object.csv and datastreams.csv files below object_root_dir.
- Saves aggregated metadata to the specified CSV files.
def create_csv_files(root_folder: pathlib._local.Path,
config: Configuration,
force_overwrite: bool = False,
update: bool = False) ‑> list[ObjectCSVManager]-
Expand source code
def create_csv_files( root_folder: Path, config: Configuration, force_overwrite: bool = False, update: bool = False, ) -> list[ObjectCSVManager]: """ Create or update CSV files for all objects under the given root folder. Iterates through all object directories found below root_folder and creates or updates their object.csv and datastreams.csv files. Args: root_folder (Path): Root directory containing object folders. config (Configuration): Project configuration. force_overwrite (bool): If True, overwrite existing CSV files. update (bool): If True, update existing CSV files instead of creating new ones. Returns: list[ObjectCSVManager]: List of managers for the processed object directories. """ extended_objects: list[ObjectCSVManager] = [] for path in find_object_folders(root_folder): if update: extended_obj = update_csv(path, config) else: extended_obj = create_csv(path, config, force_overwrite) if extended_obj is not None: extended_objects.append(extended_obj) return extended_objectsCreate or update CSV files for all objects under the given root folder.
Iterates through all object directories found below root_folder and creates or updates their object.csv and datastreams.csv files.
Args
root_folder:Path- Root directory containing object folders.
config:Configuration- Project configuration.
force_overwrite:bool- If True, overwrite existing CSV files.
update:bool- If True, update existing CSV files instead of creating new ones.
Returns
list[ObjectCSVManager]- List of managers for the processed object directories.
def csv_to_xlsx(object_csv: pathlib._local.Path,
ds_csv: pathlib._local.Path,
output_file: pathlib._local.Path) ‑> pathlib._local.Path-
Expand source code
def csv_to_xlsx(object_csv: Path, ds_csv: Path, output_file: Path) -> Path: """ Convert object and datastream CSV files to a single XLSX file. Args: object_csv (Path): Path to the object metadata CSV file. ds_csv (Path): Path to the datastream metadata CSV file. output_file (Path): Path for the output XLSX file. Returns: Path: Path to the created XLSX file. Notes: - Object metadata is written to the "Object Metadata" sheet. - Datastream metadata is written to the "Datastream Metadata" sheet. """ object_data = read_csv(object_csv, skip_header=False) ds_data = read_csv(ds_csv, skip_header=False) db = xl.Database() db.add_ws("Object Metadata") for row_id, row in enumerate(object_data, start=1): for col_id, value in enumerate(row, start=1): db.ws(ws="Object Metadata").update_index(row=row_id, col=col_id, val=value) db.add_ws("Datastream Metadata") for row_id, row_data in enumerate(ds_data, start=1): for col_id, value in enumerate(row_data, start=1): db.ws(ws="Datastream Metadata").update_index( row=row_id, col=col_id, val=value ) xl.writexl(fn=output_file, db=db) return output_fileConvert object and datastream CSV files to a single XLSX file.
Args
object_csv:Path- Path to the object metadata CSV file.
ds_csv:Path- Path to the datastream metadata CSV file.
output_file:Path- Path for the output XLSX file.
Returns
Path- Path to the created XLSX file.
Notes
- Object metadata is written to the "Object Metadata" sheet.
- Datastream metadata is written to the "Datastream Metadata" sheet.
def split_from_csv(object_root_dir: pathlib._local.Path,
object_csv_path: pathlib._local.Path | None = None,
ds_csv_path: pathlib._local.Path | None = None) ‑> tuple[int, int]-
Expand source code
def split_from_csv( object_root_dir: Path, object_csv_path: Path | None = None, ds_csv_path: Path | None = None, ) -> tuple[int, int]: """ Update object folder CSV metadata from combined CSV files. Args: object_root_dir (Path): Root directory containing all object folders. object_csv_path (Path | None): Path to combined object metadata CSV. Defaults to 'object.csv' in CWD. ds_csv_path (Path | None): Path to combined datastream metadata CSV. Defaults to 'datastreams.csv' in CWD. Returns: tuple[int, int]: Number of updated objects and number of updated datastreams. Raises: UserWarning: If an object directory does not exist. Notes: - Reads the CSV files created by collect_csv_data(). - Updates object.csv and datastreams.csv files in all object folders below object_root_dir. """ collector = ObjectCollection() collector.load_from_csv(object_csv_path, ds_csv_path) return collector.distribute_to_objects(object_root_dir)Update object folder CSV metadata from combined CSV files.
Args
object_root_dir:Path- Root directory containing all object folders.
object_csv_path:Path | None- Path to combined object metadata CSV. Defaults to 'object.csv' in CWD.
ds_csv_path:Path | None- Path to combined datastream metadata CSV. Defaults to 'datastreams.csv' in CWD.
Returns
tuple[int, int]- Number of updated objects and number of updated datastreams.
Raises
UserWarning- If an object directory does not exist.
Notes
- Reads the CSV files created by collect_csv_data().
- Updates object.csv and datastreams.csv files in all object folders below object_root_dir.
def split_from_xlsx(object_root_dir: pathlib._local.Path,
xlsx_file: pathlib._local.Path | None = None) ‑> tuple[int, int]-
Expand source code
def split_from_xlsx( object_root_dir: Path, xlsx_file: Path | None = None ) -> tuple[int, int]: """ Update object folder CSV metadata from a combined XLSX file. Args: object_root_dir (Path): Root directory containing all object folders. xlsx_file (Path | None): Path to the XLSX file. Defaults to 'all_objects.xlsx' in CWD. Returns: tuple[int, int]: Number of updated objects and number of updated datastreams. Raises: UserWarning: If an object directory does not exist. Notes: - Reads the XLSX file created by collect_csv_data(). - Updates object.csv and datastreams.csv files in all object folders below object_root_dir. """ collector = ObjectCollection() collector.load_from_xlsx(xlsx_file) return collector.distribute_to_objects(object_root_dir)Update object folder CSV metadata from a combined XLSX file.
Args
object_root_dir:Path- Root directory containing all object folders.
xlsx_file:Path | None- Path to the XLSX file. Defaults to 'all_objects.xlsx' in CWD.
Returns
tuple[int, int]- Number of updated objects and number of updated datastreams.
Raises
UserWarning- If an object directory does not exist.
Notes
- Reads the XLSX file created by collect_csv_data().
- Updates object.csv and datastreams.csv files in all object folders below object_root_dir.
def xlsx_to_csv(xlsx_path: pathlib._local.Path,
obj_csv_path: pathlib._local.Path,
ds_csv_path: pathlib._local.Path) ‑> tuple[pathlib._local.Path, pathlib._local.Path]-
Expand source code
def xlsx_to_csv( xlsx_path: Path, obj_csv_path: Path, ds_csv_path: Path ) -> tuple[Path, Path]: """ Convert a XLSX metadata file to two CSV files: object.csv and datastreams.csv. Args: xlsx_path (Path): Path to the XLSX file containing metadata. obj_csv_path (Path): Path for the output object metadata CSV file. ds_csv_path (Path): Path for the output datastream metadata CSV file. Returns: tuple[Path, Path]: Paths to the created object and datastream CSV files. Notes: - Reads "Object Metadata" and "Datastream Metadata" sheets from the XLSX file. - Writes each sheet to its respective CSV file. """ db = xl.readxl(xlsx_path) object_data = list(db.ws(ws="Object Metadata").rows) ds_data = list(db.ws(ws="Datastream Metadata").rows) with open(obj_csv_path, "w", encoding="utf-8", newline="") as f: writer = csv.writer(f) writer.writerows(object_data) with open(ds_csv_path, "w", encoding="utf-8", newline="") as f: writer = csv.writer(f) writer.writerows(ds_data) return obj_csv_path, ds_csv_pathConvert a XLSX metadata file to two CSV files: object.csv and datastreams.csv.
Args
xlsx_path:Path- Path to the XLSX file containing metadata.
obj_csv_path:Path- Path for the output object metadata CSV file.
ds_csv_path:Path- Path for the output datastream metadata CSV file.
Returns
tuple[Path, Path]- Paths to the created object and datastream CSV files.
Notes
- Reads "Object Metadata" and "Datastream Metadata" sheets from the XLSX file.
- Writes each sheet to its respective CSV file.
Classes
class DSData (dspath: str,
dsid: str = '',
title: str = '',
description: str = '',
mimetype: str = '',
creator: str = '',
rights: str = '',
lang: str = '',
tags: str = '')-
Expand source code
@dataclasses.dataclass class DSData: """ Represents metadata for a single datastream of a GAMS object. Fields: - dspath (str): Relative path to the datastream file. - dsid (str): Datastream identifier. - title (str): Title of the datastream. - description (str): Description of the datastream. - mimetype (str): MIME type of the datastream. - creator (str): Creator of the datastream. - rights (str): Rights statement for the datastream. - lang (str): Language(s) of the datastream. - tags (str): Additional tags for the datastream. """ dspath: str dsid: str = "" title: str = "" description: str = "" mimetype: str = "" creator: str = "" rights: str = "" lang: str = "" tags: str = "" @property def object_id(self): """ Return the object ID for the datastream. The object ID is inferred from the first part of the datastream path. """ return Path(self.dspath).parts[0] @classmethod def fieldnames(cls) -> list[str]: """ Return the list of field names for DSData. Returns: list[str]: Names of all fields in the DSData dataclass. """ return [field.name for field in dataclasses.fields(cls)] def merge(self, other_dsdata: "DSData"): """ Merge metadata from another DSData instance. Selectively overwrites fields ('title', 'mimetype', 'creator', 'rights') with non-empty values from the other instance. Both datastreams must have the same dspath and dsid. Args: other_dsdata (DSData): Another DSData instance to merge from. Raises: ValueError: If dspath or dsid do not match. """ if self.dspath != other_dsdata.dspath: raise ValueError("Cannot merge datastreams with different dspath values") if self.dsid != other_dsdata.dsid: raise ValueError("Cannot merge datastreams with different dsid values") fields_to_replace = ["title", "mimetype", "creator", "rights"] for field in fields_to_replace: if getattr(other_dsdata, field).strip(): setattr(self, field, getattr(other_dsdata, field)) def validate(self): """ Validate required metadata fields. Raises: ValueError: If any required field (dspath, dsid, mimetype, rights) is empty. """ if not self.dspath.strip(): raise ValueError(f"{self.dsid}: dspath must not be empty") if not self.dsid.strip(): raise ValueError(f"{self.dspath}: dsid must not be empty") if not self.mimetype.strip(): raise ValueError(f"{self.dspath}: mimetype must not be empty") if not self.rights.strip(): raise ValueError(f"{self.dspath}: rights must not be empty") def guess_missing_values(self, object_path: Path): """ Infer missing metadata values by analyzing the datastream file. Uses format detection and default values to fill in missing fields. Args: object_path (Path): Path to the object directory containing the datastream. """ ds_file = object_path / Path(self.dspath).name format_info = formatdetect.detect_format(ds_file) self._guess_mimetype(format_info) self._guess_missing_values(ds_file, format_info) def _guess_mimetype(self, format_info=None): """ Guess and set the MIME type if it is missing. Args: format_info (FormatInfo, optional): Format information for the datastream. """ if not self.mimetype and format_info is not None: self.mimetype = format_info.mimetype def _guess_missing_values(self, file_path: Path, format_info=None): """ Infer and set missing metadata fields using file and format info. Args: file_path (Path): Path to the datastream file. format_info (FormatInfo, optional): Format information for the datastream. """ if not self.title and format_info is not None: self.title = f"{format_info.description}: {self.dsid}" if not self.description and file_path.name in defaultvalues.FILENAME_MAP: self.description = defaultvalues.FILENAME_MAP[self.dsid]["description"] if not self.rights: self.rights = defaultvalues.DEFAULT_RIGHTS if not self.creator: self.creator = defaultvalues.DEFAULT_CREATORRepresents metadata for a single datastream of a GAMS object.
Fields
- dspath (str): Relative path to the datastream file.
- dsid (str): Datastream identifier.
- title (str): Title of the datastream.
- description (str): Description of the datastream.
- mimetype (str): MIME type of the datastream.
- creator (str): Creator of the datastream.
- rights (str): Rights statement for the datastream.
- lang (str): Language(s) of the datastream.
- tags (str): Additional tags for the datastream.
Static methods
def fieldnames() ‑> list[str]-
Return the list of field names for DSData.
Returns
list[str]- Names of all fields in the DSData dataclass.
Instance variables
var creator : str-
The type of the None singleton.
var description : str-
The type of the None singleton.
var dsid : str-
The type of the None singleton.
var dspath : str-
The type of the None singleton.
var lang : str-
The type of the None singleton.
var mimetype : str-
The type of the None singleton.
prop object_id-
Expand source code
@property def object_id(self): """ Return the object ID for the datastream. The object ID is inferred from the first part of the datastream path. """ return Path(self.dspath).parts[0]Return the object ID for the datastream.
The object ID is inferred from the first part of the datastream path.
var rights : str-
The type of the None singleton.
-
The type of the None singleton.
var title : str-
The type of the None singleton.
Methods
def guess_missing_values(self, object_path: pathlib._local.Path)-
Expand source code
def guess_missing_values(self, object_path: Path): """ Infer missing metadata values by analyzing the datastream file. Uses format detection and default values to fill in missing fields. Args: object_path (Path): Path to the object directory containing the datastream. """ ds_file = object_path / Path(self.dspath).name format_info = formatdetect.detect_format(ds_file) self._guess_mimetype(format_info) self._guess_missing_values(ds_file, format_info)Infer missing metadata values by analyzing the datastream file.
Uses format detection and default values to fill in missing fields.
Args
object_path:Path- Path to the object directory containing the datastream.
def merge(self,
other_dsdata: DSData)-
Expand source code
def merge(self, other_dsdata: "DSData"): """ Merge metadata from another DSData instance. Selectively overwrites fields ('title', 'mimetype', 'creator', 'rights') with non-empty values from the other instance. Both datastreams must have the same dspath and dsid. Args: other_dsdata (DSData): Another DSData instance to merge from. Raises: ValueError: If dspath or dsid do not match. """ if self.dspath != other_dsdata.dspath: raise ValueError("Cannot merge datastreams with different dspath values") if self.dsid != other_dsdata.dsid: raise ValueError("Cannot merge datastreams with different dsid values") fields_to_replace = ["title", "mimetype", "creator", "rights"] for field in fields_to_replace: if getattr(other_dsdata, field).strip(): setattr(self, field, getattr(other_dsdata, field))Merge metadata from another DSData instance.
Selectively overwrites fields ('title', 'mimetype', 'creator', 'rights') with non-empty values from the other instance. Both datastreams must have the same dspath and dsid.
Args
other_dsdata:DSData- Another DSData instance to merge from.
Raises
ValueError- If dspath or dsid do not match.
def validate(self)-
Expand source code
def validate(self): """ Validate required metadata fields. Raises: ValueError: If any required field (dspath, dsid, mimetype, rights) is empty. """ if not self.dspath.strip(): raise ValueError(f"{self.dsid}: dspath must not be empty") if not self.dsid.strip(): raise ValueError(f"{self.dspath}: dsid must not be empty") if not self.mimetype.strip(): raise ValueError(f"{self.dspath}: mimetype must not be empty") if not self.rights.strip(): raise ValueError(f"{self.dspath}: rights must not be empty")Validate required metadata fields.
Raises
ValueError- If any required field (dspath, dsid, mimetype, rights) is empty.
class ObjectCSVManager (obj_dir: pathlib._local.Path, ignore_existing_csv_files: bool = False)-
Expand source code
class ObjectCSVManager: """ Manage object and datastream metadata for a single GAMS object directory. Stores, reads, writes, validates, and merges metadata for the object and its datastreams. """ def __init__(self, obj_dir: Path, ignore_existing_csv_files: bool = False): """ Initialize the ObjectCSVManager with the given object directory. Args: obj_dir (Path): Path to the object directory. ignore_existing_csv_files (bool): If True, ignore existing CSV files when writing. Raises: FileNotFoundError: If the object directory does not exist. """ self.obj_dir: Path = obj_dir self._ignore_existing_csv_files: bool = ignore_existing_csv_files if not self.obj_dir.is_dir(): raise FileNotFoundError( f"Object directory '{self.obj_dir}' does not exist." ) self.object_id = self.obj_dir.name self._object_data: ObjectData | None = self._read_object_csv() self._datastream_data: list[DSData] = self._read_datastreams_csv() def set_object(self, object_data: ObjectData, replace: bool = False) -> None: """ Set the object metadata. Args: object_data (ObjectData): Object metadata to set. replace (bool): If True, replace existing object data. Raises: ValueError: If object data is already set and replace is False. """ if self._object_data is not None and not replace: raise ValueError("Object data has already been set.") self._object_data = object_data def merge_object(self, object_data: ObjectData) -> None: """ Merge the object metadata with another ObjectData object. Args: object_data (ObjectData): Object metadata to merge. """ if self._object_data is None: self._object_data = object_data else: self._object_data.merge(object_data) def get_object(self) -> ObjectData: """ Return the object metadata. Returns: ObjectData: The object metadata, or None if not set. """ return self._object_data def add_datastream(self, dsdata: DSData, replace: bool = False) -> None: """ Add a datastream to the object. Args: dsdata (DSData): Datastream metadata to add. replace (bool): If True, replace existing datastream with the same dsid. Raises: ValueError: If datastream with the same dsid exists and replace is False. """ if dsdata.dsid in [ds.dsid for ds in self._datastream_data]: if replace: self._datastream_data = [ ds for ds in self._datastream_data if ds.dsid != dsdata.dsid ] else: raise ValueError(f"Datastream with id {dsdata.dsid} already exists.") self._datastream_data.append(dsdata) def merge_datastream(self, dsdata: DSData) -> None: """ Merge the datastream metadata with another DSData object. Args: dsdata (DSData): Datastream metadata to merge. """ for existing_ds in self._datastream_data: if existing_ds.dsid == dsdata.dsid and existing_ds.dspath == dsdata.dspath: existing_ds.merge(dsdata) return self.add_datastream(dsdata) def get_datastreamdata(self) -> Generator[DSData, None, None]: """ Return a generator for all datastream metadata. Returns: Generator[DSData, None, None]: Generator of DSData objects. """ yield from self._datastream_data def count_datastreams(self) -> int: """ Return the number of datastreams. Returns: int: Number of datastreams. """ return len(self._datastream_data) def get_languages(self): """ Return the languages of the datastreams ordered by frequency. Returns: list[str]: List of language codes ordered by frequency. """ languages = [] for dsdata in self.get_datastreamdata(): if dsdata.lang: dlangs = utils.split_entry(dsdata.lang) languages.extend(dlangs) langcounter = Counter(languages) return [entry[0] for entry in langcounter.most_common()] def is_empty(self) -> bool: """ Return True if the object has no CSV metadata. Returns: bool: True if object or datastream metadata is missing, False otherwise. """ return self._object_data is None or not self._datastream_data def save(self) -> None: """ Save the object metadata and datastreams to their respective CSV files. Raises: FileExistsError: If CSV files already exist and ignore_existing_csv_files is False. """ self._write_object_csv() self._write_datastreams_csv() def clear(self) -> None: """ Clear the object metadata and datastreams, and delete the CSV files. Removes all metadata and deletes object.csv and datastreams.csv files if present. """ self._object_data = None self._datastream_data = [] obj_csv_file = self.obj_dir / OBJ_CSV_FILENAME ds_csv_file = self.obj_dir / DS_CSV_FILENAME if obj_csv_file.is_file(): obj_csv_file.unlink() if ds_csv_file.is_file(): ds_csv_file.unlink() def validate(self) -> None: """ Validate the object metadata and datastreams. Raises: ValueError: If metadata is missing or invalid. """ if self.is_empty(): raise ValueError("Object metadata (csv) is not set.") self._object_data.validate() for dsdata in self._datastream_data: dsdata.validate() def guess_mainresource(self) -> None: """ Guess and set the main resource of the object based on the datastreams. Heuristics: - If there is only one XML datastream besides DC.xml, use it as mainResource. Returns: str: The guessed main resource ID, or empty string if not determined. """ main_resource = "" xml_files = [] for dsdata in self.get_datastreamdata(): if dsdata.dsid not in ("DC.xml", "DC") and dsdata.mimetype in ( "application/xml", "text/xml", "application/tei+xml", ): xml_files.append(dsdata.dsid) if len(xml_files) == 1: self._object_data.mainResource = xml_files[0] return main_resource def _read_object_csv(self) -> ObjectData | None: """ Read object metadata from the CSV file. Returns: ObjectData | None: Object metadata if present, else None. """ csv_file = self.obj_dir / OBJ_CSV_FILENAME if not csv_file.is_file(): return None with csv_file.open(encoding="utf-8", newline="") as f: for row in csv.DictReader(f): if "mainresource" in row: row["mainResource"] = row.pop("mainresource") return ObjectData(**row) def _write_object_csv(self): """ Write the object metadata to the CSV file. Raises: FileExistsError: If the CSV file exists and ignore_existing_csv_files is False. """ csv_file = self.obj_dir / OBJ_CSV_FILENAME if csv_file.is_file() and not self._ignore_existing_csv_files: raise FileExistsError(f"Object CSV file '{csv_file}' already exists.") with csv_file.open("w", encoding="utf-8", newline="") as f: fieldnames = ObjectData.fieldnames() writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() writer.writerow(dataclasses.asdict(self._object_data)) def _read_datastreams_csv(self) -> list[DSData]: """ Read datastream metadata from the CSV file. Returns: list[DSData]: List of datastream metadata. """ datastreams = [] csv_file = self.obj_dir / DS_CSV_FILENAME if not csv_file.is_file(): return [] with csv_file.open(encoding="utf-8", newline="") as f: for row in csv.DictReader(f): dsdata = DSData(**row) datastreams.append(dsdata) return datastreams def _write_datastreams_csv(self): """ Write the datastream metadata to the CSV file. Notes: - Datastreams are sorted by dsid before writing. """ csv_file = self.obj_dir / DS_CSV_FILENAME self._datastream_data.sort(key=lambda ds: ds.dsid) with csv_file.open("w", encoding="utf-8", newline="") as f: fieldnames = DSData.fieldnames() writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() for dsdata in self._datastream_data: writer.writerow(dataclasses.asdict(dsdata))Manage object and datastream metadata for a single GAMS object directory.
Stores, reads, writes, validates, and merges metadata for the object and its datastreams.
Initialize the ObjectCSVManager with the given object directory.
Args
obj_dir:Path- Path to the object directory.
ignore_existing_csv_files:bool- If True, ignore existing CSV files when writing.
Raises
FileNotFoundError- If the object directory does not exist.
Methods
def add_datastream(self,
dsdata: DSData,
replace: bool = False) ‑> None-
Expand source code
def add_datastream(self, dsdata: DSData, replace: bool = False) -> None: """ Add a datastream to the object. Args: dsdata (DSData): Datastream metadata to add. replace (bool): If True, replace existing datastream with the same dsid. Raises: ValueError: If datastream with the same dsid exists and replace is False. """ if dsdata.dsid in [ds.dsid for ds in self._datastream_data]: if replace: self._datastream_data = [ ds for ds in self._datastream_data if ds.dsid != dsdata.dsid ] else: raise ValueError(f"Datastream with id {dsdata.dsid} already exists.") self._datastream_data.append(dsdata)Add a datastream to the object.
Args
dsdata:DSData- Datastream metadata to add.
replace:bool- If True, replace existing datastream with the same dsid.
Raises
ValueError- If datastream with the same dsid exists and replace is False.
def clear(self) ‑> None-
Expand source code
def clear(self) -> None: """ Clear the object metadata and datastreams, and delete the CSV files. Removes all metadata and deletes object.csv and datastreams.csv files if present. """ self._object_data = None self._datastream_data = [] obj_csv_file = self.obj_dir / OBJ_CSV_FILENAME ds_csv_file = self.obj_dir / DS_CSV_FILENAME if obj_csv_file.is_file(): obj_csv_file.unlink() if ds_csv_file.is_file(): ds_csv_file.unlink()Clear the object metadata and datastreams, and delete the CSV files.
Removes all metadata and deletes object.csv and datastreams.csv files if present.
def count_datastreams(self) ‑> int-
Expand source code
def count_datastreams(self) -> int: """ Return the number of datastreams. Returns: int: Number of datastreams. """ return len(self._datastream_data)Return the number of datastreams.
Returns
int- Number of datastreams.
def get_datastreamdata(self) ‑> Generator[DSData, None, None]-
Expand source code
def get_datastreamdata(self) -> Generator[DSData, None, None]: """ Return a generator for all datastream metadata. Returns: Generator[DSData, None, None]: Generator of DSData objects. """ yield from self._datastream_dataReturn a generator for all datastream metadata.
Returns
Generator[DSData, None, None]- Generator of DSData objects.
def get_languages(self)-
Expand source code
def get_languages(self): """ Return the languages of the datastreams ordered by frequency. Returns: list[str]: List of language codes ordered by frequency. """ languages = [] for dsdata in self.get_datastreamdata(): if dsdata.lang: dlangs = utils.split_entry(dsdata.lang) languages.extend(dlangs) langcounter = Counter(languages) return [entry[0] for entry in langcounter.most_common()]Return the languages of the datastreams ordered by frequency.
Returns
list[str]- List of language codes ordered by frequency.
def get_object(self) ‑> ObjectData-
Expand source code
def get_object(self) -> ObjectData: """ Return the object metadata. Returns: ObjectData: The object metadata, or None if not set. """ return self._object_data def guess_mainresource(self) ‑> None-
Expand source code
def guess_mainresource(self) -> None: """ Guess and set the main resource of the object based on the datastreams. Heuristics: - If there is only one XML datastream besides DC.xml, use it as mainResource. Returns: str: The guessed main resource ID, or empty string if not determined. """ main_resource = "" xml_files = [] for dsdata in self.get_datastreamdata(): if dsdata.dsid not in ("DC.xml", "DC") and dsdata.mimetype in ( "application/xml", "text/xml", "application/tei+xml", ): xml_files.append(dsdata.dsid) if len(xml_files) == 1: self._object_data.mainResource = xml_files[0] return main_resourceGuess and set the main resource of the object based on the datastreams.
Heuristics
- If there is only one XML datastream besides DC.xml, use it as mainResource.
Returns
str- The guessed main resource ID, or empty string if not determined.
def is_empty(self) ‑> bool-
Expand source code
def is_empty(self) -> bool: """ Return True if the object has no CSV metadata. Returns: bool: True if object or datastream metadata is missing, False otherwise. """ return self._object_data is None or not self._datastream_dataReturn True if the object has no CSV metadata.
Returns
bool- True if object or datastream metadata is missing, False otherwise.
def merge_datastream(self,
dsdata: DSData) ‑> None-
Expand source code
def merge_datastream(self, dsdata: DSData) -> None: """ Merge the datastream metadata with another DSData object. Args: dsdata (DSData): Datastream metadata to merge. """ for existing_ds in self._datastream_data: if existing_ds.dsid == dsdata.dsid and existing_ds.dspath == dsdata.dspath: existing_ds.merge(dsdata) return self.add_datastream(dsdata)Merge the datastream metadata with another DSData object.
Args
dsdata:DSData- Datastream metadata to merge.
def merge_object(self,
object_data: ObjectData) ‑> None-
Expand source code
def merge_object(self, object_data: ObjectData) -> None: """ Merge the object metadata with another ObjectData object. Args: object_data (ObjectData): Object metadata to merge. """ if self._object_data is None: self._object_data = object_data else: self._object_data.merge(object_data)Merge the object metadata with another ObjectData object.
Args
object_data:ObjectData- Object metadata to merge.
def save(self) ‑> None-
Expand source code
def save(self) -> None: """ Save the object metadata and datastreams to their respective CSV files. Raises: FileExistsError: If CSV files already exist and ignore_existing_csv_files is False. """ self._write_object_csv() self._write_datastreams_csv()Save the object metadata and datastreams to their respective CSV files.
Raises
FileExistsError- If CSV files already exist and ignore_existing_csv_files is False.
def set_object(self,
object_data: ObjectData,
replace: bool = False) ‑> None-
Expand source code
def set_object(self, object_data: ObjectData, replace: bool = False) -> None: """ Set the object metadata. Args: object_data (ObjectData): Object metadata to set. replace (bool): If True, replace existing object data. Raises: ValueError: If object data is already set and replace is False. """ if self._object_data is not None and not replace: raise ValueError("Object data has already been set.") self._object_data = object_dataSet the object metadata.
Args
object_data:ObjectData- Object metadata to set.
replace:bool- If True, replace existing object data.
Raises
ValueError- If object data is already set and replace is False.
def validate(self) ‑> None-
Expand source code
def validate(self) -> None: """ Validate the object metadata and datastreams. Raises: ValueError: If metadata is missing or invalid. """ if self.is_empty(): raise ValueError("Object metadata (csv) is not set.") self._object_data.validate() for dsdata in self._datastream_data: dsdata.validate()Validate the object metadata and datastreams.
Raises
ValueError- If metadata is missing or invalid.
class ObjectCollection-
Expand source code
class ObjectCollection: """ Represents a collection of metadata for multiple GAMS objects and their datastreams. Used to aggregate, save, load, and distribute object and datastream metadata between individual object directories and combined CSV/XLSX files. """ def __init__(self): """ Initialize an empty ObjectCollection. """ self.objects: dict[str, ObjectData] = {} # keys are recids (pid) self.datastreams: dict[str, list[DSData]] = {} # keys are object ids (recids) def collect_from_objects(self, root_dir: Path) -> None: """ Collect metadata from all object directories below root_dir. Args: root_dir (Path): Directory containing object folders. Raises: ValueError: If object metadata (CSV) is missing for any object directory. """ for obj_dir in find_object_folders(root_dir): object_meta = ObjectCSVManager(obj_dir) if object_meta.is_empty(): raise ValueError( f"Object metadata (csv) is not set for {obj_dir}. " "Please check the object directory." ) self.objects[obj_dir.name] = object_meta.get_object() for dsdata in object_meta.get_datastreamdata(): if obj_dir.name not in self.datastreams: self.datastreams[obj_dir.name] = [] self.datastreams[obj_dir.name].append(dsdata) def distribute_to_objects(self, root_dir: Path) -> tuple[int, int]: """ Distribute aggregated metadata to individual object directories. Updates object.csv and datastreams.csv files in each object directory. Args: root_dir (Path): Directory containing object folders. Returns: tuple[int, int]: Number of updated objects and datastreams. Raises: UserWarning: If an object directory does not exist. """ updated_objects_counter = 0 updated_datastreams_counter = 0 for obj_id, obj_data in self.objects.items(): obj_dir = root_dir / obj_id if obj_dir.is_dir(): obj_mgr = ObjectCSVManager(obj_dir, ignore_existing_csv_files=True) obj_mgr.set_object(obj_data, replace=True) updated_objects_counter += 1 for dsdata in self.datastreams.get(obj_id, []): obj_mgr.add_datastream(dsdata, replace=True) updated_datastreams_counter += 1 obj_mgr.save() else: raise UserWarning( f"Object directory {obj_dir} does not exist. Skipping." ) return updated_objects_counter, updated_datastreams_counter def count_objects(self) -> int: """ Return the number of objects in the collection. Returns: int: Number of objects. """ return len(self.objects) def count_datastreams(self) -> int: """ Return the total number of datastreams in the collection. Returns: int: Number of datastreams. """ return sum(len(ds) for ds in self.datastreams.values()) def save_to_csv( self, obj_file: Path | None = None, ds_file: Path | None = None ) -> None: """ Save object and datastream metadata to two CSV files. Args: obj_file (Path | None): Path for object metadata CSV. Defaults to 'all_objects.csv'. ds_file (Path | None): Path for datastream metadata CSV. Defaults to 'all_datastreams.csv'. """ obj_file = obj_file or Path(ALL_OBJECTS_CSV) ds_file = ds_file or Path(ALL_DATASTREAMS_CSV) with obj_file.open("w", encoding="utf-8", newline="") as f: writer = csv.DictWriter(f, fieldnames=ObjectData.fieldnames()) writer.writeheader() for obj in self.objects.values(): writer.writerow(asdict(obj)) with ds_file.open("w", encoding="utf-8", newline="") as f: writer = csv.DictWriter(f, fieldnames=DSData.fieldnames()) writer.writeheader() for datastreams in self.datastreams.values(): for dsdata in datastreams: writer.writerow(asdict(dsdata)) def save_to_xlsx(self, xlsx_file: Path | None = None) -> None: """ Save object and datastream metadata to a single XLSX file with two sheets. Args: xlsx_file (Path | None): Path for XLSX file. Defaults to 'all_objects.xlsx'. """ xlsx_file = xlsx_file or Path(ALL_OBJECTS_XLSX) with tempfile.TemporaryDirectory() as tmpdir: obj_file = Path(tmpdir) / ALL_OBJECTS_CSV ds_file = Path(tmpdir) / ALL_DATASTREAMS_CSV self.save_to_csv(obj_file, ds_file) xlsx.csv_to_xlsx(obj_file, ds_file, xlsx_file) def load_from_csv( self, obj_file: Path | None = None, ds_file: Path | None = None ) -> None: """ Load object and datastream metadata from two CSV files. Args: obj_file (Path | None): Path for object metadata CSV. Defaults to 'all_objects.csv'. ds_file (Path | None): Path for datastream metadata CSV. Defaults to 'all_datastreams.csv'. Raises: FileNotFoundError: If either CSV file does not exist. """ obj_file = obj_file or Path(ALL_OBJECTS_CSV) ds_file = ds_file or Path(ALL_DATASTREAMS_CSV) if not obj_file.is_file(): raise FileNotFoundError(f"Required csv file {obj_file} does not exist.") if not ds_file.is_file(): raise FileNotFoundError(f"Required csv file {ds_file} does not exist.") self.objects.clear() self.datastreams.clear() with obj_file.open("r", encoding="utf-8", newline="") as f: reader = csv.DictReader(f) for row in reader: obj_data = ObjectData(**row) self.objects[obj_data.recid] = obj_data with ds_file.open("r", encoding="utf-8", newline="") as f: reader = csv.DictReader(f) for row in reader: ds_data = DSData(**row) obj_id = ds_data.dspath.split("/")[0] # Extract object id from dspath if obj_id not in self.datastreams: self.datastreams[obj_id] = [] self.datastreams[obj_id].append(ds_data) def load_from_xlsx(self, xlsx_file: Path | None = None) -> None: """ Load object and datastream metadata from a single XLSX file with two sheets. Args: xlsx_file (Path | None): Path for XLSX file. Defaults to 'all_objects.xlsx'. Raises: FileNotFoundError: If the XLSX file does not exist. """ xlsx_file = xlsx_file or Path(ALL_OBJECTS_XLSX) if not xlsx_file.is_file(): raise FileNotFoundError(f"File {xlsx_file} does not exist.") with tempfile.TemporaryDirectory() as tmpdir: obj_file = Path(tmpdir) / ALL_OBJECTS_CSV ds_file = Path(tempfile.tempdir) / ALL_DATASTREAMS_CSV xlsx.xlsx_to_csv(xlsx_file, obj_file, ds_file) self.load_from_csv(obj_file, ds_file)Represents a collection of metadata for multiple GAMS objects and their datastreams.
Used to aggregate, save, load, and distribute object and datastream metadata between individual object directories and combined CSV/XLSX files.
Initialize an empty ObjectCollection.
Methods
def collect_from_objects(self, root_dir: pathlib._local.Path) ‑> None-
Expand source code
def collect_from_objects(self, root_dir: Path) -> None: """ Collect metadata from all object directories below root_dir. Args: root_dir (Path): Directory containing object folders. Raises: ValueError: If object metadata (CSV) is missing for any object directory. """ for obj_dir in find_object_folders(root_dir): object_meta = ObjectCSVManager(obj_dir) if object_meta.is_empty(): raise ValueError( f"Object metadata (csv) is not set for {obj_dir}. " "Please check the object directory." ) self.objects[obj_dir.name] = object_meta.get_object() for dsdata in object_meta.get_datastreamdata(): if obj_dir.name not in self.datastreams: self.datastreams[obj_dir.name] = [] self.datastreams[obj_dir.name].append(dsdata)Collect metadata from all object directories below root_dir.
Args
root_dir:Path- Directory containing object folders.
Raises
ValueError- If object metadata (CSV) is missing for any object directory.
def count_datastreams(self) ‑> int-
Expand source code
def count_datastreams(self) -> int: """ Return the total number of datastreams in the collection. Returns: int: Number of datastreams. """ return sum(len(ds) for ds in self.datastreams.values())Return the total number of datastreams in the collection.
Returns
int- Number of datastreams.
def count_objects(self) ‑> int-
Expand source code
def count_objects(self) -> int: """ Return the number of objects in the collection. Returns: int: Number of objects. """ return len(self.objects)Return the number of objects in the collection.
Returns
int- Number of objects.
def distribute_to_objects(self, root_dir: pathlib._local.Path) ‑> tuple[int, int]-
Expand source code
def distribute_to_objects(self, root_dir: Path) -> tuple[int, int]: """ Distribute aggregated metadata to individual object directories. Updates object.csv and datastreams.csv files in each object directory. Args: root_dir (Path): Directory containing object folders. Returns: tuple[int, int]: Number of updated objects and datastreams. Raises: UserWarning: If an object directory does not exist. """ updated_objects_counter = 0 updated_datastreams_counter = 0 for obj_id, obj_data in self.objects.items(): obj_dir = root_dir / obj_id if obj_dir.is_dir(): obj_mgr = ObjectCSVManager(obj_dir, ignore_existing_csv_files=True) obj_mgr.set_object(obj_data, replace=True) updated_objects_counter += 1 for dsdata in self.datastreams.get(obj_id, []): obj_mgr.add_datastream(dsdata, replace=True) updated_datastreams_counter += 1 obj_mgr.save() else: raise UserWarning( f"Object directory {obj_dir} does not exist. Skipping." ) return updated_objects_counter, updated_datastreams_counterDistribute aggregated metadata to individual object directories.
Updates object.csv and datastreams.csv files in each object directory.
Args
root_dir:Path- Directory containing object folders.
Returns
tuple[int, int]- Number of updated objects and datastreams.
Raises
UserWarning- If an object directory does not exist.
def load_from_csv(self,
obj_file: pathlib._local.Path | None = None,
ds_file: pathlib._local.Path | None = None) ‑> None-
Expand source code
def load_from_csv( self, obj_file: Path | None = None, ds_file: Path | None = None ) -> None: """ Load object and datastream metadata from two CSV files. Args: obj_file (Path | None): Path for object metadata CSV. Defaults to 'all_objects.csv'. ds_file (Path | None): Path for datastream metadata CSV. Defaults to 'all_datastreams.csv'. Raises: FileNotFoundError: If either CSV file does not exist. """ obj_file = obj_file or Path(ALL_OBJECTS_CSV) ds_file = ds_file or Path(ALL_DATASTREAMS_CSV) if not obj_file.is_file(): raise FileNotFoundError(f"Required csv file {obj_file} does not exist.") if not ds_file.is_file(): raise FileNotFoundError(f"Required csv file {ds_file} does not exist.") self.objects.clear() self.datastreams.clear() with obj_file.open("r", encoding="utf-8", newline="") as f: reader = csv.DictReader(f) for row in reader: obj_data = ObjectData(**row) self.objects[obj_data.recid] = obj_data with ds_file.open("r", encoding="utf-8", newline="") as f: reader = csv.DictReader(f) for row in reader: ds_data = DSData(**row) obj_id = ds_data.dspath.split("/")[0] # Extract object id from dspath if obj_id not in self.datastreams: self.datastreams[obj_id] = [] self.datastreams[obj_id].append(ds_data)Load object and datastream metadata from two CSV files.
Args
obj_file:Path | None- Path for object metadata CSV. Defaults to 'all_objects.csv'.
ds_file:Path | None- Path for datastream metadata CSV. Defaults to 'all_datastreams.csv'.
Raises
FileNotFoundError- If either CSV file does not exist.
def load_from_xlsx(self, xlsx_file: pathlib._local.Path | None = None) ‑> None-
Expand source code
def load_from_xlsx(self, xlsx_file: Path | None = None) -> None: """ Load object and datastream metadata from a single XLSX file with two sheets. Args: xlsx_file (Path | None): Path for XLSX file. Defaults to 'all_objects.xlsx'. Raises: FileNotFoundError: If the XLSX file does not exist. """ xlsx_file = xlsx_file or Path(ALL_OBJECTS_XLSX) if not xlsx_file.is_file(): raise FileNotFoundError(f"File {xlsx_file} does not exist.") with tempfile.TemporaryDirectory() as tmpdir: obj_file = Path(tmpdir) / ALL_OBJECTS_CSV ds_file = Path(tempfile.tempdir) / ALL_DATASTREAMS_CSV xlsx.xlsx_to_csv(xlsx_file, obj_file, ds_file) self.load_from_csv(obj_file, ds_file)Load object and datastream metadata from a single XLSX file with two sheets.
Args
xlsx_file:Path | None- Path for XLSX file. Defaults to 'all_objects.xlsx'.
Raises
FileNotFoundError- If the XLSX file does not exist.
def save_to_csv(self,
obj_file: pathlib._local.Path | None = None,
ds_file: pathlib._local.Path | None = None) ‑> None-
Expand source code
def save_to_csv( self, obj_file: Path | None = None, ds_file: Path | None = None ) -> None: """ Save object and datastream metadata to two CSV files. Args: obj_file (Path | None): Path for object metadata CSV. Defaults to 'all_objects.csv'. ds_file (Path | None): Path for datastream metadata CSV. Defaults to 'all_datastreams.csv'. """ obj_file = obj_file or Path(ALL_OBJECTS_CSV) ds_file = ds_file or Path(ALL_DATASTREAMS_CSV) with obj_file.open("w", encoding="utf-8", newline="") as f: writer = csv.DictWriter(f, fieldnames=ObjectData.fieldnames()) writer.writeheader() for obj in self.objects.values(): writer.writerow(asdict(obj)) with ds_file.open("w", encoding="utf-8", newline="") as f: writer = csv.DictWriter(f, fieldnames=DSData.fieldnames()) writer.writeheader() for datastreams in self.datastreams.values(): for dsdata in datastreams: writer.writerow(asdict(dsdata))Save object and datastream metadata to two CSV files.
Args
obj_file:Path | None- Path for object metadata CSV. Defaults to 'all_objects.csv'.
ds_file:Path | None- Path for datastream metadata CSV. Defaults to 'all_datastreams.csv'.
def save_to_xlsx(self, xlsx_file: pathlib._local.Path | None = None) ‑> None-
Expand source code
def save_to_xlsx(self, xlsx_file: Path | None = None) -> None: """ Save object and datastream metadata to a single XLSX file with two sheets. Args: xlsx_file (Path | None): Path for XLSX file. Defaults to 'all_objects.xlsx'. """ xlsx_file = xlsx_file or Path(ALL_OBJECTS_XLSX) with tempfile.TemporaryDirectory() as tmpdir: obj_file = Path(tmpdir) / ALL_OBJECTS_CSV ds_file = Path(tmpdir) / ALL_DATASTREAMS_CSV self.save_to_csv(obj_file, ds_file) xlsx.csv_to_xlsx(obj_file, ds_file, xlsx_file)Save object and datastream metadata to a single XLSX file with two sheets.
Args
xlsx_file:Path | None- Path for XLSX file. Defaults to 'all_objects.xlsx'.
class ObjectData (recid: str,
title: str = '',
project: str = '',
description: str = '',
creator: str = '',
rights: str = '',
publisher: str = '',
source: str = '',
objectType: str = '',
mainResource: str = '',
funder: str = '')-
Expand source code
@dataclass class ObjectData: """ Represents CSV metadata for a single GAMS object. Fields: - recid (str): Object identifier. - title (str): Title of the object. - project (str): Project name or identifier. - description (str): Description of the object. - creator (str): Creator of the object. - rights (str): Rights statement for the object. - publisher (str): Publisher of the object. - source (str): Source of the object. - objectType (str): Type of the object. - mainResource (str): Main datastream identifier. - funder (str): Funder information. """ recid: str title: str = "" project: str = "" description: str = "" creator: str = "" rights: str = "" publisher: str = "" source: str = "" objectType: str = "" mainResource: str = "" # main datastream funder: str = "" @classmethod def fieldnames(cls) -> list[str]: """ Return the list of field names for ObjectData. Returns: list[str]: Names of all fields in the ObjectData dataclass. """ return [field.name for field in dataclasses.fields(cls)] def merge(self, other: "ObjectData"): """ Merge the object data with another ObjectData instance. Overwrites fields with non-empty values from the other instance. Both objects must have the same recid. Args: other (ObjectData): Another ObjectData instance to merge from. Raises: ValueError: If recid values do not match. """ if self.recid != other.recid: raise ValueError("Cannot merge objects with different recid values") # These are the fields which are possibly set automatically set in the new object data fields_to_merge = [ "title", "project", "creator", "rights", "publisher", "source", "objectType", "mainResource", "funder", ] for field in fields_to_merge: if getattr(other, field).strip(): setattr(self, field, getattr(other, field)) def validate(self): """ Validate required metadata fields. Raises: ValueError: If any required field is empty. """ if not self.recid: raise ValueError("recid must not be empty") if not self.title: raise ValueError(f"{self.recid}: title must not be empty") if not self.rights: raise ValueError(f"{self.recid}: rights must not be empty") if not self.source: raise ValueError(f"{self.recid}: source must not be empty") if not self.objectType: raise ValueError(f"{self.recid}: objectType must not be empty")Represents CSV metadata for a single GAMS object.
Fields
- recid (str): Object identifier.
- title (str): Title of the object.
- project (str): Project name or identifier.
- description (str): Description of the object.
- creator (str): Creator of the object.
- rights (str): Rights statement for the object.
- publisher (str): Publisher of the object.
- source (str): Source of the object.
- objectType (str): Type of the object.
- mainResource (str): Main datastream identifier.
- funder (str): Funder information.
Static methods
def fieldnames() ‑> list[str]-
Return the list of field names for ObjectData.
Returns
list[str]- Names of all fields in the ObjectData dataclass.
Instance variables
var creator : str-
The type of the None singleton.
var description : str-
The type of the None singleton.
var funder : str-
The type of the None singleton.
var mainResource : str-
The type of the None singleton.
var objectType : str-
The type of the None singleton.
var project : str-
The type of the None singleton.
var publisher : str-
The type of the None singleton.
var recid : str-
The type of the None singleton.
var rights : str-
The type of the None singleton.
var source : str-
The type of the None singleton.
var title : str-
The type of the None singleton.
Methods
def merge(self,
other: ObjectData)-
Expand source code
def merge(self, other: "ObjectData"): """ Merge the object data with another ObjectData instance. Overwrites fields with non-empty values from the other instance. Both objects must have the same recid. Args: other (ObjectData): Another ObjectData instance to merge from. Raises: ValueError: If recid values do not match. """ if self.recid != other.recid: raise ValueError("Cannot merge objects with different recid values") # These are the fields which are possibly set automatically set in the new object data fields_to_merge = [ "title", "project", "creator", "rights", "publisher", "source", "objectType", "mainResource", "funder", ] for field in fields_to_merge: if getattr(other, field).strip(): setattr(self, field, getattr(other, field))Merge the object data with another ObjectData instance.
Overwrites fields with non-empty values from the other instance. Both objects must have the same recid.
Args
other:ObjectData- Another ObjectData instance to merge from.
Raises
ValueError- If recid values do not match.
def validate(self)-
Expand source code
def validate(self): """ Validate required metadata fields. Raises: ValueError: If any required field is empty. """ if not self.recid: raise ValueError("recid must not be empty") if not self.title: raise ValueError(f"{self.recid}: title must not be empty") if not self.rights: raise ValueError(f"{self.recid}: rights must not be empty") if not self.source: raise ValueError(f"{self.recid}: source must not be empty") if not self.objectType: raise ValueError(f"{self.recid}: objectType must not be empty")Validate required metadata fields.
Raises
ValueError- If any required field is empty.