Module gamslib.objectcsv

Tools for managing object and datastream metadata in CSV files for GAMS projects.

This package provides utilities to read, write, validate, and manipulate metadata stored in object.csv and datastreams.csv files, which accompany GAMS bags but are not part of the bag itself.

Main components:

  • ObjectCSVManager: Manages metadata for a single object and its datastreams. Handles reading, writing, validating, merging, and updating CSV files.
  • ObjectCollection: Aggregates metadata from multiple objects into a single CSV file and distributes updates back to individual object directories. Useful for batch editing and synchronization.
  • dublincore_csv: Functions for accessing and processing Dublin Core metadata from 'DC.xml' files, including language preference utilities.
  • create_csv: Initializes CSV files for all objects in a project.
  • manage_csv: Collects metadata from all objects into a single CSV for efficient editing, and updates individual object directories from the aggregated data.
  • xlsx: Converts CSV files to XLSX format and vice versa, enabling spreadsheet-based editing and avoiding encoding issues common with CSV imports/exports.

Public API:

DSData
ObjectCSVManager
ObjectCollection
ObjectData
collect_csv_data()
create_csv_files()
csv_to_xlsx()
split_from_csv()
split_from_xlsx()
xlsx_to_csv()

These classes and functions are imported into the package namespace for direct use.

Sub-modules

gamslib.objectcsv.create_csv

Create object.csv and datastreams.csv files for GAMS objects …

gamslib.objectcsv.defaultvalues

Default values and namespaces for datastream metadata in GAMS projects …

gamslib.objectcsv.dsdata

Datastream metadata model for GAMS object CSV files …

gamslib.objectcsv.dublincore

Dublin Core metadata access for GAMS objects …

gamslib.objectcsv.exceptions

Custom exceptions for the GAMSlib object CSV module …

gamslib.objectcsv.manage_csv

Functions to collect and update object and datastream CSV files for GAMS projects …

gamslib.objectcsv.objectcollection

Aggregate and manage CSV/XLSX metadata for multiple GAMS objects …

gamslib.objectcsv.objectcsvmanager

Manage CSV metadata for GAMS objects and their datastreams …

gamslib.objectcsv.objectdata

CSV data model for a single GAMS object …

gamslib.objectcsv.utils

Utility functions for the objectcsv module …

gamslib.objectcsv.xlsx

Utilities to convert object and datastream CSV files to XLSX format and back …

Functions

def collect_csv_data(object_root_dir: pathlib._local.Path,
object_csv_path: pathlib._local.Path | None = None,
datastream_csv_path: pathlib._local.Path | None = None) ‑> ObjectCollection
Expand source code
def collect_csv_data(
    object_root_dir: Path,
    object_csv_path: Path | None = None,
    datastream_csv_path: Path | None = None,
) -> ObjectCollection:
    """
    Collect metadata from all object folders below object_root_dir and save to combined CSV files.

    Args:
        object_root_dir (Path): Root directory containing all object folders.
        object_csv_path (Path | None): Path to save combined object metadata CSV. Defaults to 'object.csv' in CWD.
        datastream_csv_path (Path | None): Path to save combined datastream metadata CSV. Defaults to 'datastreams.csv' in CWD.

    Returns:
        ObjectCollection: Collection containing all object and datastream metadata.

    Notes:
        - Reads all object.csv and datastreams.csv files below object_root_dir.
        - Saves aggregated metadata to the specified CSV files.
    """
    object_csv_path = object_csv_path or Path.cwd() / objectcollection.ALL_OBJECTS_CSV
    datastream_csv_path = (
        datastream_csv_path or Path.cwd() / objectcollection.ALL_DATASTREAMS_CSV
    )

    collector = ObjectCollection()
    collector.collect_from_objects(object_root_dir)
    collector.save_to_csv(object_csv_path, datastream_csv_path)
    return collector

Collect metadata from all object folders below object_root_dir and save to combined CSV files.

Args

object_root_dir : Path
Root directory containing all object folders.
object_csv_path : Path | None
Path to save combined object metadata CSV. Defaults to 'object.csv' in CWD.
datastream_csv_path : Path | None
Path to save combined datastream metadata CSV. Defaults to 'datastreams.csv' in CWD.

Returns

ObjectCollection
Collection containing all object and datastream metadata.

Notes

  • Reads all object.csv and datastreams.csv files below object_root_dir.
  • Saves aggregated metadata to the specified CSV files.
def create_csv_files(root_folder: pathlib._local.Path,
config: Configuration,
force_overwrite: bool = False,
update: bool = False) ‑> list[ObjectCSVManager]
Expand source code
def create_csv_files(
    root_folder: Path,
    config: Configuration,
    force_overwrite: bool = False,
    update: bool = False,
) -> list[ObjectCSVManager]:
    """
    Create or update CSV files for all objects under the given root folder.

    Iterates through all object directories found below root_folder and creates or updates
    their object.csv and datastreams.csv files.

    Args:
        root_folder (Path): Root directory containing object folders.
        config (Configuration): Project configuration.
        force_overwrite (bool): If True, overwrite existing CSV files.
        update (bool): If True, update existing CSV files instead of creating new ones.

    Returns:
        list[ObjectCSVManager]: List of managers for the processed object directories.
    """
    extended_objects: list[ObjectCSVManager] = []
    for path in find_object_folders(root_folder):
        if update:
            extended_obj = update_csv(path, config)
        else:
            extended_obj = create_csv(path, config, force_overwrite)

        if extended_obj is not None:
            extended_objects.append(extended_obj)
    return extended_objects

Create or update CSV files for all objects under the given root folder.

Iterates through all object directories found below root_folder and creates or updates their object.csv and datastreams.csv files.

Args

root_folder : Path
Root directory containing object folders.
config : Configuration
Project configuration.
force_overwrite : bool
If True, overwrite existing CSV files.
update : bool
If True, update existing CSV files instead of creating new ones.

Returns

list[ObjectCSVManager]
List of managers for the processed object directories.
def csv_to_xlsx(object_csv: pathlib._local.Path,
ds_csv: pathlib._local.Path,
output_file: pathlib._local.Path) ‑> pathlib._local.Path
Expand source code
def csv_to_xlsx(object_csv: Path, ds_csv: Path, output_file: Path) -> Path:
    """
    Convert object and datastream CSV files to a single XLSX file.

    Args:
        object_csv (Path): Path to the object metadata CSV file.
        ds_csv (Path): Path to the datastream metadata CSV file.
        output_file (Path): Path for the output XLSX file.

    Returns:
        Path: Path to the created XLSX file.

    Notes:
        - Object metadata is written to the "Object Metadata" sheet.
        - Datastream metadata is written to the "Datastream Metadata" sheet.
    """
    object_data = read_csv(object_csv, skip_header=False)
    ds_data = read_csv(ds_csv, skip_header=False)

    db = xl.Database()
    db.add_ws("Object Metadata")
    for row_id, row in enumerate(object_data, start=1):
        for col_id, value in enumerate(row, start=1):
            db.ws(ws="Object Metadata").update_index(row=row_id, col=col_id, val=value)
    db.add_ws("Datastream Metadata")
    for row_id, row_data in enumerate(ds_data, start=1):
        for col_id, value in enumerate(row_data, start=1):
            db.ws(ws="Datastream Metadata").update_index(
                row=row_id, col=col_id, val=value
            )
    xl.writexl(fn=output_file, db=db)
    return output_file

Convert object and datastream CSV files to a single XLSX file.

Args

object_csv : Path
Path to the object metadata CSV file.
ds_csv : Path
Path to the datastream metadata CSV file.
output_file : Path
Path for the output XLSX file.

Returns

Path
Path to the created XLSX file.

Notes

  • Object metadata is written to the "Object Metadata" sheet.
  • Datastream metadata is written to the "Datastream Metadata" sheet.
def split_from_csv(object_root_dir: pathlib._local.Path,
object_csv_path: pathlib._local.Path | None = None,
ds_csv_path: pathlib._local.Path | None = None) ‑> tuple[int, int]
Expand source code
def split_from_csv(
    object_root_dir: Path,
    object_csv_path: Path | None = None,
    ds_csv_path: Path | None = None,
) -> tuple[int, int]:
    """
    Update object folder CSV metadata from combined CSV files.

    Args:
        object_root_dir (Path): Root directory containing all object folders.
        object_csv_path (Path | None): Path to combined object metadata CSV. Defaults to 'object.csv' in CWD.
        ds_csv_path (Path | None): Path to combined datastream metadata CSV. Defaults to 'datastreams.csv' in CWD.

    Returns:
        tuple[int, int]: Number of updated objects and number of updated datastreams.

    Raises:
        UserWarning: If an object directory does not exist.

    Notes:
        - Reads the CSV files created by collect_csv_data().
        - Updates object.csv and datastreams.csv files in all object folders below object_root_dir.
    """
    collector = ObjectCollection()
    collector.load_from_csv(object_csv_path, ds_csv_path)
    return collector.distribute_to_objects(object_root_dir)

Update object folder CSV metadata from combined CSV files.

Args

object_root_dir : Path
Root directory containing all object folders.
object_csv_path : Path | None
Path to combined object metadata CSV. Defaults to 'object.csv' in CWD.
ds_csv_path : Path | None
Path to combined datastream metadata CSV. Defaults to 'datastreams.csv' in CWD.

Returns

tuple[int, int]
Number of updated objects and number of updated datastreams.

Raises

UserWarning
If an object directory does not exist.

Notes

  • Reads the CSV files created by collect_csv_data().
  • Updates object.csv and datastreams.csv files in all object folders below object_root_dir.
def split_from_xlsx(object_root_dir: pathlib._local.Path,
xlsx_file: pathlib._local.Path | None = None) ‑> tuple[int, int]
Expand source code
def split_from_xlsx(
    object_root_dir: Path, xlsx_file: Path | None = None
) -> tuple[int, int]:
    """
    Update object folder CSV metadata from a combined XLSX file.

    Args:
        object_root_dir (Path): Root directory containing all object folders.
        xlsx_file (Path | None): Path to the XLSX file. Defaults to 'all_objects.xlsx' in CWD.

    Returns:
        tuple[int, int]: Number of updated objects and number of updated datastreams.

    Raises:
        UserWarning: If an object directory does not exist.

    Notes:
        - Reads the XLSX file created by collect_csv_data().
        - Updates object.csv and datastreams.csv files in all object folders below object_root_dir.
    """
    collector = ObjectCollection()
    collector.load_from_xlsx(xlsx_file)
    return collector.distribute_to_objects(object_root_dir)

Update object folder CSV metadata from a combined XLSX file.

Args

object_root_dir : Path
Root directory containing all object folders.
xlsx_file : Path | None
Path to the XLSX file. Defaults to 'all_objects.xlsx' in CWD.

Returns

tuple[int, int]
Number of updated objects and number of updated datastreams.

Raises

UserWarning
If an object directory does not exist.

Notes

  • Reads the XLSX file created by collect_csv_data().
  • Updates object.csv and datastreams.csv files in all object folders below object_root_dir.
def xlsx_to_csv(xlsx_path: pathlib._local.Path,
obj_csv_path: pathlib._local.Path,
ds_csv_path: pathlib._local.Path) ‑> tuple[pathlib._local.Path, pathlib._local.Path]
Expand source code
def xlsx_to_csv(
    xlsx_path: Path, obj_csv_path: Path, ds_csv_path: Path
) -> tuple[Path, Path]:
    """
    Convert a XLSX metadata file to two CSV files: object.csv and datastreams.csv.

    Args:
        xlsx_path (Path): Path to the XLSX file containing metadata.
        obj_csv_path (Path): Path for the output object metadata CSV file.
        ds_csv_path (Path): Path for the output datastream metadata CSV file.

    Returns:
        tuple[Path, Path]: Paths to the created object and datastream CSV files.

    Notes:
        - Reads "Object Metadata" and "Datastream Metadata" sheets from the XLSX file.
        - Writes each sheet to its respective CSV file.
    """
    db = xl.readxl(xlsx_path)

    object_data = list(db.ws(ws="Object Metadata").rows)
    ds_data = list(db.ws(ws="Datastream Metadata").rows)

    with open(obj_csv_path, "w", encoding="utf-8", newline="") as f:
        writer = csv.writer(f)
        writer.writerows(object_data)

    with open(ds_csv_path, "w", encoding="utf-8", newline="") as f:
        writer = csv.writer(f)
        writer.writerows(ds_data)
    return obj_csv_path, ds_csv_path

Convert a XLSX metadata file to two CSV files: object.csv and datastreams.csv.

Args

xlsx_path : Path
Path to the XLSX file containing metadata.
obj_csv_path : Path
Path for the output object metadata CSV file.
ds_csv_path : Path
Path for the output datastream metadata CSV file.

Returns

tuple[Path, Path]
Paths to the created object and datastream CSV files.

Notes

  • Reads "Object Metadata" and "Datastream Metadata" sheets from the XLSX file.
  • Writes each sheet to its respective CSV file.

Classes

class DSData (dspath: str,
dsid: str = '',
title: str = '',
description: str = '',
mimetype: str = '',
creator: str = '',
rights: str = '',
lang: str = '',
tags: str = '')
Expand source code
@dataclasses.dataclass
class DSData:
    """
    Represents metadata for a single datastream of a GAMS object.

    Fields:

      - dspath (str): Relative path to the datastream file.
      - dsid (str): Datastream identifier.
      - title (str): Title of the datastream.
      - description (str): Description of the datastream.
      - mimetype (str): MIME type of the datastream.
      - creator (str): Creator of the datastream.
      - rights (str): Rights statement for the datastream.
      - lang (str): Language(s) of the datastream.
      - tags (str): Additional tags for the datastream.
    """

    dspath: str
    dsid: str = ""
    title: str = ""
    description: str = ""
    mimetype: str = ""
    creator: str = ""
    rights: str = ""
    lang: str = ""
    tags: str = ""

    @property
    def object_id(self):
        """
        Return the object ID for the datastream.

        The object ID is inferred from the first part of the datastream path.
        """
        return Path(self.dspath).parts[0]

    @classmethod
    def fieldnames(cls) -> list[str]:
        """
        Return the list of field names for DSData.

        Returns:
            list[str]: Names of all fields in the DSData dataclass.
        """
        return [field.name for field in dataclasses.fields(cls)]

    def merge(self, other_dsdata: "DSData"):
        """
        Merge metadata from another DSData instance.

        Selectively overwrites fields ('title', 'mimetype', 'creator', 'rights') with non-empty
        values from the other instance. Both datastreams must have the same dspath and dsid.

        Args:
            other_dsdata (DSData): Another DSData instance to merge from.

        Raises:
            ValueError: If dspath or dsid do not match.
        """
        if self.dspath != other_dsdata.dspath:
            raise ValueError("Cannot merge datastreams with different dspath values")
        if self.dsid != other_dsdata.dsid:
            raise ValueError("Cannot merge datastreams with different dsid values")

        fields_to_replace = ["title", "mimetype", "creator", "rights"]
        for field in fields_to_replace:
            if getattr(other_dsdata, field).strip():
                setattr(self, field, getattr(other_dsdata, field))

    def validate(self):
        """
        Validate required metadata fields.

        Raises:
            ValueError: If any required field (dspath, dsid, mimetype, rights) is empty.
        """
        if not self.dspath.strip():
            raise ValueError(f"{self.dsid}: dspath must not be empty")
        if not self.dsid.strip():
            raise ValueError(f"{self.dspath}: dsid must not be empty")
        if not self.mimetype.strip():
            raise ValueError(f"{self.dspath}: mimetype must not be empty")
        if not self.rights.strip():
            raise ValueError(f"{self.dspath}: rights must not be empty")

    def guess_missing_values(self, object_path: Path):
        """
        Infer missing metadata values by analyzing the datastream file.

        Uses format detection and default values to fill in missing fields.

        Args:
            object_path (Path): Path to the object directory containing the datastream.
        """
        ds_file = object_path / Path(self.dspath).name
        format_info = formatdetect.detect_format(ds_file)
        self._guess_mimetype(format_info)
        self._guess_missing_values(ds_file, format_info)

    def _guess_mimetype(self, format_info=None):
        """
        Guess and set the MIME type if it is missing.

        Args:
            format_info (FormatInfo, optional): Format information for the datastream.
        """
        if not self.mimetype and format_info is not None:
            self.mimetype = format_info.mimetype

    def _guess_missing_values(self, file_path: Path, format_info=None):
        """
        Infer and set missing metadata fields using file and format info.

        Args:
            file_path (Path): Path to the datastream file.
            format_info (FormatInfo, optional): Format information for the datastream.
        """
        if not self.title and format_info is not None:
            self.title = f"{format_info.description}: {self.dsid}"

        if not self.description and file_path.name in defaultvalues.FILENAME_MAP:
            self.description = defaultvalues.FILENAME_MAP[self.dsid]["description"]
        if not self.rights:
            self.rights = defaultvalues.DEFAULT_RIGHTS
        if not self.creator:
            self.creator = defaultvalues.DEFAULT_CREATOR

Represents metadata for a single datastream of a GAMS object.

Fields

  • dspath (str): Relative path to the datastream file.
  • dsid (str): Datastream identifier.
  • title (str): Title of the datastream.
  • description (str): Description of the datastream.
  • mimetype (str): MIME type of the datastream.
  • creator (str): Creator of the datastream.
  • rights (str): Rights statement for the datastream.
  • lang (str): Language(s) of the datastream.
  • tags (str): Additional tags for the datastream.

Static methods

def fieldnames() ‑> list[str]

Return the list of field names for DSData.

Returns

list[str]
Names of all fields in the DSData dataclass.

Instance variables

var creator : str

The type of the None singleton.

var description : str

The type of the None singleton.

var dsid : str

The type of the None singleton.

var dspath : str

The type of the None singleton.

var lang : str

The type of the None singleton.

var mimetype : str

The type of the None singleton.

prop object_id
Expand source code
@property
def object_id(self):
    """
    Return the object ID for the datastream.

    The object ID is inferred from the first part of the datastream path.
    """
    return Path(self.dspath).parts[0]

Return the object ID for the datastream.

The object ID is inferred from the first part of the datastream path.

var rights : str

The type of the None singleton.

var tags : str

The type of the None singleton.

var title : str

The type of the None singleton.

Methods

def guess_missing_values(self, object_path: pathlib._local.Path)
Expand source code
def guess_missing_values(self, object_path: Path):
    """
    Infer missing metadata values by analyzing the datastream file.

    Uses format detection and default values to fill in missing fields.

    Args:
        object_path (Path): Path to the object directory containing the datastream.
    """
    ds_file = object_path / Path(self.dspath).name
    format_info = formatdetect.detect_format(ds_file)
    self._guess_mimetype(format_info)
    self._guess_missing_values(ds_file, format_info)

Infer missing metadata values by analyzing the datastream file.

Uses format detection and default values to fill in missing fields.

Args

object_path : Path
Path to the object directory containing the datastream.
def merge(self,
other_dsdata: DSData)
Expand source code
def merge(self, other_dsdata: "DSData"):
    """
    Merge metadata from another DSData instance.

    Selectively overwrites fields ('title', 'mimetype', 'creator', 'rights') with non-empty
    values from the other instance. Both datastreams must have the same dspath and dsid.

    Args:
        other_dsdata (DSData): Another DSData instance to merge from.

    Raises:
        ValueError: If dspath or dsid do not match.
    """
    if self.dspath != other_dsdata.dspath:
        raise ValueError("Cannot merge datastreams with different dspath values")
    if self.dsid != other_dsdata.dsid:
        raise ValueError("Cannot merge datastreams with different dsid values")

    fields_to_replace = ["title", "mimetype", "creator", "rights"]
    for field in fields_to_replace:
        if getattr(other_dsdata, field).strip():
            setattr(self, field, getattr(other_dsdata, field))

Merge metadata from another DSData instance.

Selectively overwrites fields ('title', 'mimetype', 'creator', 'rights') with non-empty values from the other instance. Both datastreams must have the same dspath and dsid.

Args

other_dsdata : DSData
Another DSData instance to merge from.

Raises

ValueError
If dspath or dsid do not match.
def validate(self)
Expand source code
def validate(self):
    """
    Validate required metadata fields.

    Raises:
        ValueError: If any required field (dspath, dsid, mimetype, rights) is empty.
    """
    if not self.dspath.strip():
        raise ValueError(f"{self.dsid}: dspath must not be empty")
    if not self.dsid.strip():
        raise ValueError(f"{self.dspath}: dsid must not be empty")
    if not self.mimetype.strip():
        raise ValueError(f"{self.dspath}: mimetype must not be empty")
    if not self.rights.strip():
        raise ValueError(f"{self.dspath}: rights must not be empty")

Validate required metadata fields.

Raises

ValueError
If any required field (dspath, dsid, mimetype, rights) is empty.
class ObjectCSVManager (obj_dir: pathlib._local.Path, ignore_existing_csv_files: bool = False)
Expand source code
class ObjectCSVManager:
    """
    Manage object and datastream metadata for a single GAMS object directory.

    Stores, reads, writes, validates, and merges metadata for the object and its datastreams.
    """

    def __init__(self, obj_dir: Path, ignore_existing_csv_files: bool = False):
        """
        Initialize the ObjectCSVManager with the given object directory.

        Args:
            obj_dir (Path): Path to the object directory.
            ignore_existing_csv_files (bool): If True, ignore existing CSV files when writing.
        
        Raises:
            FileNotFoundError: If the object directory does not exist.
        """
        self.obj_dir: Path = obj_dir
        self._ignore_existing_csv_files: bool = ignore_existing_csv_files
        if not self.obj_dir.is_dir():
            raise FileNotFoundError(
                f"Object directory '{self.obj_dir}' does not exist."
            )
        self.object_id = self.obj_dir.name
        self._object_data: ObjectData | None = self._read_object_csv()
        self._datastream_data: list[DSData] = self._read_datastreams_csv()

    def set_object(self, object_data: ObjectData, replace: bool = False) -> None:
        """
        Set the object metadata.

        Args:
            object_data (ObjectData): Object metadata to set.
            replace (bool): If True, replace existing object data.

        Raises:
            ValueError: If object data is already set and replace is False.
        """
        if self._object_data is not None and not replace:
            raise ValueError("Object data has already been set.")
        self._object_data = object_data

    def merge_object(self, object_data: ObjectData) -> None:
        """
        Merge the object metadata with another ObjectData object.

        Args:
            object_data (ObjectData): Object metadata to merge.
        """
        if self._object_data is None:
            self._object_data = object_data
        else:
            self._object_data.merge(object_data)

    def get_object(self) -> ObjectData:
        """
        Return the object metadata.

        Returns:
            ObjectData: The object metadata, or None if not set.
        """
        return self._object_data

    def add_datastream(self, dsdata: DSData, replace: bool = False) -> None:
        """
        Add a datastream to the object.

        Args:
            dsdata (DSData): Datastream metadata to add.
            replace (bool): If True, replace existing datastream with the same dsid.

        Raises:
            ValueError: If datastream with the same dsid exists and replace is False.
        """
        if dsdata.dsid in [ds.dsid for ds in self._datastream_data]:
            if replace:
                self._datastream_data = [
                    ds for ds in self._datastream_data if ds.dsid != dsdata.dsid
                ]
            else:
                raise ValueError(f"Datastream with id {dsdata.dsid} already exists.")
        self._datastream_data.append(dsdata)

    def merge_datastream(self, dsdata: DSData) -> None:
        """
        Merge the datastream metadata with another DSData object.

        Args:
            dsdata (DSData): Datastream metadata to merge.
        """
        for existing_ds in self._datastream_data:
            if existing_ds.dsid == dsdata.dsid and existing_ds.dspath == dsdata.dspath:
                existing_ds.merge(dsdata)
                return
        self.add_datastream(dsdata)

    def get_datastreamdata(self) -> Generator[DSData, None, None]:
        """
        Return a generator for all datastream metadata.

        Returns:
            Generator[DSData, None, None]: Generator of DSData objects.
        """
        yield from self._datastream_data

    def count_datastreams(self) -> int:
        """
        Return the number of datastreams.

        Returns:
            int: Number of datastreams.
        """
        return len(self._datastream_data)

    def get_languages(self):
        """
        Return the languages of the datastreams ordered by frequency.

        Returns:
            list[str]: List of language codes ordered by frequency.
        """
        languages = []
        for dsdata in self.get_datastreamdata():
            if dsdata.lang:
                dlangs = utils.split_entry(dsdata.lang)
                languages.extend(dlangs)
        langcounter = Counter(languages)
        return [entry[0] for entry in langcounter.most_common()]

    def is_empty(self) -> bool:
        """
        Return True if the object has no CSV metadata.

        Returns:
            bool: True if object or datastream metadata is missing, False otherwise.
        """
        return self._object_data is None or not self._datastream_data

    def save(self) -> None:
        """
        Save the object metadata and datastreams to their respective CSV files.

        Raises:
            FileExistsError: If CSV files already exist and ignore_existing_csv_files is False.
        """
        self._write_object_csv()
        self._write_datastreams_csv()

    def clear(self) -> None:
        """
        Clear the object metadata and datastreams, and delete the CSV files.

        Removes all metadata and deletes object.csv and datastreams.csv files if present.
        """
        self._object_data = None
        self._datastream_data = []
        obj_csv_file = self.obj_dir / OBJ_CSV_FILENAME
        ds_csv_file = self.obj_dir / DS_CSV_FILENAME
        if obj_csv_file.is_file():
            obj_csv_file.unlink()
        if ds_csv_file.is_file():
            ds_csv_file.unlink()

    def validate(self) -> None:
        """
        Validate the object metadata and datastreams.

        Raises:
            ValueError: If metadata is missing or invalid.
        """
        if self.is_empty():
            raise ValueError("Object metadata (csv) is not set.")
        self._object_data.validate()
        for dsdata in self._datastream_data:
            dsdata.validate()

    def guess_mainresource(self) -> None:
        """
        Guess and set the main resource of the object based on the datastreams.

        Heuristics:
            - If there is only one XML datastream besides DC.xml, use it as mainResource.

        Returns:
            str: The guessed main resource ID, or empty string if not determined.
        """
        main_resource = ""
        xml_files = []
        for dsdata in self.get_datastreamdata():
            if dsdata.dsid not in ("DC.xml", "DC") and dsdata.mimetype in (
                "application/xml",
                "text/xml",
                "application/tei+xml",
            ):
                xml_files.append(dsdata.dsid)
        if len(xml_files) == 1:
            self._object_data.mainResource = xml_files[0]
        return main_resource

    def _read_object_csv(self) -> ObjectData | None:
        """
        Read object metadata from the CSV file.

        Returns:
            ObjectData | None: Object metadata if present, else None.
        """
        csv_file = self.obj_dir / OBJ_CSV_FILENAME

        if not csv_file.is_file():
            return None
        with csv_file.open(encoding="utf-8", newline="") as f:
            for row in csv.DictReader(f):
                if "mainresource" in row:
                    row["mainResource"] = row.pop("mainresource")
                return ObjectData(**row)

    def _write_object_csv(self):
        """
        Write the object metadata to the CSV file.

        Raises:
            FileExistsError: If the CSV file exists and ignore_existing_csv_files is False.
        """
        csv_file = self.obj_dir / OBJ_CSV_FILENAME
        if csv_file.is_file() and not self._ignore_existing_csv_files:
            raise FileExistsError(f"Object CSV file '{csv_file}' already exists.")
        with csv_file.open("w", encoding="utf-8", newline="") as f:
            fieldnames = ObjectData.fieldnames()
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerow(dataclasses.asdict(self._object_data))

    def _read_datastreams_csv(self) -> list[DSData]:
        """
        Read datastream metadata from the CSV file.

        Returns:
            list[DSData]: List of datastream metadata.
        """
        datastreams = []
        csv_file = self.obj_dir / DS_CSV_FILENAME
        if not csv_file.is_file():
            return []
        with csv_file.open(encoding="utf-8", newline="") as f:
            for row in csv.DictReader(f):
                dsdata = DSData(**row)
                datastreams.append(dsdata)
        return datastreams

    def _write_datastreams_csv(self):
        """
        Write the datastream metadata to the CSV file.

        Notes:
            - Datastreams are sorted by dsid before writing.
        """
        csv_file = self.obj_dir / DS_CSV_FILENAME
        self._datastream_data.sort(key=lambda ds: ds.dsid)
        with csv_file.open("w", encoding="utf-8", newline="") as f:
            fieldnames = DSData.fieldnames()
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()
            for dsdata in self._datastream_data:
                writer.writerow(dataclasses.asdict(dsdata))

Manage object and datastream metadata for a single GAMS object directory.

Stores, reads, writes, validates, and merges metadata for the object and its datastreams.

Initialize the ObjectCSVManager with the given object directory.

Args

obj_dir : Path
Path to the object directory.
ignore_existing_csv_files : bool
If True, ignore existing CSV files when writing.

Raises

FileNotFoundError
If the object directory does not exist.

Methods

def add_datastream(self,
dsdata: DSData,
replace: bool = False) ‑> None
Expand source code
def add_datastream(self, dsdata: DSData, replace: bool = False) -> None:
    """
    Add a datastream to the object.

    Args:
        dsdata (DSData): Datastream metadata to add.
        replace (bool): If True, replace existing datastream with the same dsid.

    Raises:
        ValueError: If datastream with the same dsid exists and replace is False.
    """
    if dsdata.dsid in [ds.dsid for ds in self._datastream_data]:
        if replace:
            self._datastream_data = [
                ds for ds in self._datastream_data if ds.dsid != dsdata.dsid
            ]
        else:
            raise ValueError(f"Datastream with id {dsdata.dsid} already exists.")
    self._datastream_data.append(dsdata)

Add a datastream to the object.

Args

dsdata : DSData
Datastream metadata to add.
replace : bool
If True, replace existing datastream with the same dsid.

Raises

ValueError
If datastream with the same dsid exists and replace is False.
def clear(self) ‑> None
Expand source code
def clear(self) -> None:
    """
    Clear the object metadata and datastreams, and delete the CSV files.

    Removes all metadata and deletes object.csv and datastreams.csv files if present.
    """
    self._object_data = None
    self._datastream_data = []
    obj_csv_file = self.obj_dir / OBJ_CSV_FILENAME
    ds_csv_file = self.obj_dir / DS_CSV_FILENAME
    if obj_csv_file.is_file():
        obj_csv_file.unlink()
    if ds_csv_file.is_file():
        ds_csv_file.unlink()

Clear the object metadata and datastreams, and delete the CSV files.

Removes all metadata and deletes object.csv and datastreams.csv files if present.

def count_datastreams(self) ‑> int
Expand source code
def count_datastreams(self) -> int:
    """
    Return the number of datastreams.

    Returns:
        int: Number of datastreams.
    """
    return len(self._datastream_data)

Return the number of datastreams.

Returns

int
Number of datastreams.
def get_datastreamdata(self) ‑> Generator[DSData, None, None]
Expand source code
def get_datastreamdata(self) -> Generator[DSData, None, None]:
    """
    Return a generator for all datastream metadata.

    Returns:
        Generator[DSData, None, None]: Generator of DSData objects.
    """
    yield from self._datastream_data

Return a generator for all datastream metadata.

Returns

Generator[DSData, None, None]
Generator of DSData objects.
def get_languages(self)
Expand source code
def get_languages(self):
    """
    Return the languages of the datastreams ordered by frequency.

    Returns:
        list[str]: List of language codes ordered by frequency.
    """
    languages = []
    for dsdata in self.get_datastreamdata():
        if dsdata.lang:
            dlangs = utils.split_entry(dsdata.lang)
            languages.extend(dlangs)
    langcounter = Counter(languages)
    return [entry[0] for entry in langcounter.most_common()]

Return the languages of the datastreams ordered by frequency.

Returns

list[str]
List of language codes ordered by frequency.
def get_object(self) ‑> ObjectData
Expand source code
def get_object(self) -> ObjectData:
    """
    Return the object metadata.

    Returns:
        ObjectData: The object metadata, or None if not set.
    """
    return self._object_data

Return the object metadata.

Returns

ObjectData
The object metadata, or None if not set.
def guess_mainresource(self) ‑> None
Expand source code
def guess_mainresource(self) -> None:
    """
    Guess and set the main resource of the object based on the datastreams.

    Heuristics:
        - If there is only one XML datastream besides DC.xml, use it as mainResource.

    Returns:
        str: The guessed main resource ID, or empty string if not determined.
    """
    main_resource = ""
    xml_files = []
    for dsdata in self.get_datastreamdata():
        if dsdata.dsid not in ("DC.xml", "DC") and dsdata.mimetype in (
            "application/xml",
            "text/xml",
            "application/tei+xml",
        ):
            xml_files.append(dsdata.dsid)
    if len(xml_files) == 1:
        self._object_data.mainResource = xml_files[0]
    return main_resource

Guess and set the main resource of the object based on the datastreams.

Heuristics

  • If there is only one XML datastream besides DC.xml, use it as mainResource.

Returns

str
The guessed main resource ID, or empty string if not determined.
def is_empty(self) ‑> bool
Expand source code
def is_empty(self) -> bool:
    """
    Return True if the object has no CSV metadata.

    Returns:
        bool: True if object or datastream metadata is missing, False otherwise.
    """
    return self._object_data is None or not self._datastream_data

Return True if the object has no CSV metadata.

Returns

bool
True if object or datastream metadata is missing, False otherwise.
def merge_datastream(self,
dsdata: DSData) ‑> None
Expand source code
def merge_datastream(self, dsdata: DSData) -> None:
    """
    Merge the datastream metadata with another DSData object.

    Args:
        dsdata (DSData): Datastream metadata to merge.
    """
    for existing_ds in self._datastream_data:
        if existing_ds.dsid == dsdata.dsid and existing_ds.dspath == dsdata.dspath:
            existing_ds.merge(dsdata)
            return
    self.add_datastream(dsdata)

Merge the datastream metadata with another DSData object.

Args

dsdata : DSData
Datastream metadata to merge.
def merge_object(self,
object_data: ObjectData) ‑> None
Expand source code
def merge_object(self, object_data: ObjectData) -> None:
    """
    Merge the object metadata with another ObjectData object.

    Args:
        object_data (ObjectData): Object metadata to merge.
    """
    if self._object_data is None:
        self._object_data = object_data
    else:
        self._object_data.merge(object_data)

Merge the object metadata with another ObjectData object.

Args

object_data : ObjectData
Object metadata to merge.
def save(self) ‑> None
Expand source code
def save(self) -> None:
    """
    Save the object metadata and datastreams to their respective CSV files.

    Raises:
        FileExistsError: If CSV files already exist and ignore_existing_csv_files is False.
    """
    self._write_object_csv()
    self._write_datastreams_csv()

Save the object metadata and datastreams to their respective CSV files.

Raises

FileExistsError
If CSV files already exist and ignore_existing_csv_files is False.
def set_object(self,
object_data: ObjectData,
replace: bool = False) ‑> None
Expand source code
def set_object(self, object_data: ObjectData, replace: bool = False) -> None:
    """
    Set the object metadata.

    Args:
        object_data (ObjectData): Object metadata to set.
        replace (bool): If True, replace existing object data.

    Raises:
        ValueError: If object data is already set and replace is False.
    """
    if self._object_data is not None and not replace:
        raise ValueError("Object data has already been set.")
    self._object_data = object_data

Set the object metadata.

Args

object_data : ObjectData
Object metadata to set.
replace : bool
If True, replace existing object data.

Raises

ValueError
If object data is already set and replace is False.
def validate(self) ‑> None
Expand source code
def validate(self) -> None:
    """
    Validate the object metadata and datastreams.

    Raises:
        ValueError: If metadata is missing or invalid.
    """
    if self.is_empty():
        raise ValueError("Object metadata (csv) is not set.")
    self._object_data.validate()
    for dsdata in self._datastream_data:
        dsdata.validate()

Validate the object metadata and datastreams.

Raises

ValueError
If metadata is missing or invalid.
class ObjectCollection
Expand source code
class ObjectCollection:
    """
    Represents a collection of metadata for multiple GAMS objects and their datastreams.

    Used to aggregate, save, load, and distribute object and datastream metadata
    between individual object directories and combined CSV/XLSX files.
    """

    def __init__(self):
        """
        Initialize an empty ObjectCollection.
        """
        self.objects: dict[str, ObjectData] = {}  # keys are recids (pid)
        self.datastreams: dict[str, list[DSData]] = {}  # keys are object ids (recids)

    def collect_from_objects(self, root_dir: Path) -> None:
        """
        Collect metadata from all object directories below root_dir.

        Args:
            root_dir (Path): Directory containing object folders.

        Raises:
            ValueError: If object metadata (CSV) is missing for any object directory.
        """
        for obj_dir in find_object_folders(root_dir):
            object_meta = ObjectCSVManager(obj_dir)
            if object_meta.is_empty():
                raise ValueError(
                    f"Object metadata (csv) is not set for {obj_dir}. "
                    "Please check the object directory."
                )
            self.objects[obj_dir.name] = object_meta.get_object()
            for dsdata in object_meta.get_datastreamdata():
                if obj_dir.name not in self.datastreams:
                    self.datastreams[obj_dir.name] = []
                self.datastreams[obj_dir.name].append(dsdata)

    def distribute_to_objects(self, root_dir: Path) -> tuple[int, int]:
        """
        Distribute aggregated metadata to individual object directories.

        Updates object.csv and datastreams.csv files in each object directory.

        Args:
            root_dir (Path): Directory containing object folders.

        Returns:
            tuple[int, int]: Number of updated objects and datastreams.

        Raises:
            UserWarning: If an object directory does not exist.
        """
        updated_objects_counter = 0
        updated_datastreams_counter = 0
        for obj_id, obj_data in self.objects.items():
            obj_dir = root_dir / obj_id
            if obj_dir.is_dir():
                obj_mgr = ObjectCSVManager(obj_dir, ignore_existing_csv_files=True)
                obj_mgr.set_object(obj_data, replace=True)
                updated_objects_counter += 1
                for dsdata in self.datastreams.get(obj_id, []):
                    obj_mgr.add_datastream(dsdata, replace=True)
                    updated_datastreams_counter += 1
                obj_mgr.save()
            else:
                raise UserWarning(
                    f"Object directory {obj_dir} does not exist. Skipping."
                )
        return updated_objects_counter, updated_datastreams_counter

    def count_objects(self) -> int:
        """
        Return the number of objects in the collection.

        Returns:
            int: Number of objects.
        """
        return len(self.objects)

    def count_datastreams(self) -> int:
        """
        Return the total number of datastreams in the collection.

        Returns:
            int: Number of datastreams.
        """
        return sum(len(ds) for ds in self.datastreams.values())

    def save_to_csv(
        self, obj_file: Path | None = None, ds_file: Path | None = None
    ) -> None:
        """
        Save object and datastream metadata to two CSV files.

        Args:
            obj_file (Path | None): Path for object metadata CSV. Defaults to 'all_objects.csv'.
            ds_file (Path | None): Path for datastream metadata CSV. Defaults to 'all_datastreams.csv'.
        """
        obj_file = obj_file or Path(ALL_OBJECTS_CSV)
        ds_file = ds_file or Path(ALL_DATASTREAMS_CSV)
        with obj_file.open("w", encoding="utf-8", newline="") as f:
            writer = csv.DictWriter(f, fieldnames=ObjectData.fieldnames())
            writer.writeheader()
            for obj in self.objects.values():
                writer.writerow(asdict(obj))
        with ds_file.open("w", encoding="utf-8", newline="") as f:
            writer = csv.DictWriter(f, fieldnames=DSData.fieldnames())
            writer.writeheader()
            for datastreams in self.datastreams.values():
                for dsdata in datastreams:
                    writer.writerow(asdict(dsdata))

    def save_to_xlsx(self, xlsx_file: Path | None = None) -> None:
        """
        Save object and datastream metadata to a single XLSX file with two sheets.

        Args:
            xlsx_file (Path | None): Path for XLSX file. Defaults to 'all_objects.xlsx'.
        """
        xlsx_file = xlsx_file or Path(ALL_OBJECTS_XLSX)
        with tempfile.TemporaryDirectory() as tmpdir:
            obj_file = Path(tmpdir) / ALL_OBJECTS_CSV
            ds_file = Path(tmpdir) / ALL_DATASTREAMS_CSV
            self.save_to_csv(obj_file, ds_file)
            xlsx.csv_to_xlsx(obj_file, ds_file, xlsx_file)

    def load_from_csv(
        self, obj_file: Path | None = None, ds_file: Path | None = None
    ) -> None:
        """
        Load object and datastream metadata from two CSV files.

        Args:
            obj_file (Path | None): Path for object metadata CSV. Defaults to 'all_objects.csv'.
            ds_file (Path | None): Path for datastream metadata CSV. Defaults to 'all_datastreams.csv'.

        Raises:
            FileNotFoundError: If either CSV file does not exist.
        """
        obj_file = obj_file or Path(ALL_OBJECTS_CSV)
        ds_file = ds_file or Path(ALL_DATASTREAMS_CSV)
        if not obj_file.is_file():
            raise FileNotFoundError(f"Required csv file {obj_file} does not exist.")
        if not ds_file.is_file():
            raise FileNotFoundError(f"Required csv file {ds_file} does not exist.")
        self.objects.clear()
        self.datastreams.clear()

        with obj_file.open("r", encoding="utf-8", newline="") as f:
            reader = csv.DictReader(f)
            for row in reader:
                obj_data = ObjectData(**row)
                self.objects[obj_data.recid] = obj_data

        with ds_file.open("r", encoding="utf-8", newline="") as f:
            reader = csv.DictReader(f)
            for row in reader:
                ds_data = DSData(**row)
                obj_id = ds_data.dspath.split("/")[0]  # Extract object id from dspath
                if obj_id not in self.datastreams:
                    self.datastreams[obj_id] = []
                self.datastreams[obj_id].append(ds_data)

    def load_from_xlsx(self, xlsx_file: Path | None = None) -> None:
        """
        Load object and datastream metadata from a single XLSX file with two sheets.

        Args:
            xlsx_file (Path | None): Path for XLSX file. Defaults to 'all_objects.xlsx'.

        Raises:
            FileNotFoundError: If the XLSX file does not exist.
        """
        xlsx_file = xlsx_file or Path(ALL_OBJECTS_XLSX)

        if not xlsx_file.is_file():
            raise FileNotFoundError(f"File {xlsx_file} does not exist.")
        with tempfile.TemporaryDirectory() as tmpdir:
            obj_file = Path(tmpdir) / ALL_OBJECTS_CSV
            ds_file = Path(tempfile.tempdir) / ALL_DATASTREAMS_CSV
            xlsx.xlsx_to_csv(xlsx_file, obj_file, ds_file)
            self.load_from_csv(obj_file, ds_file)

Represents a collection of metadata for multiple GAMS objects and their datastreams.

Used to aggregate, save, load, and distribute object and datastream metadata between individual object directories and combined CSV/XLSX files.

Initialize an empty ObjectCollection.

Methods

def collect_from_objects(self, root_dir: pathlib._local.Path) ‑> None
Expand source code
def collect_from_objects(self, root_dir: Path) -> None:
    """
    Collect metadata from all object directories below root_dir.

    Args:
        root_dir (Path): Directory containing object folders.

    Raises:
        ValueError: If object metadata (CSV) is missing for any object directory.
    """
    for obj_dir in find_object_folders(root_dir):
        object_meta = ObjectCSVManager(obj_dir)
        if object_meta.is_empty():
            raise ValueError(
                f"Object metadata (csv) is not set for {obj_dir}. "
                "Please check the object directory."
            )
        self.objects[obj_dir.name] = object_meta.get_object()
        for dsdata in object_meta.get_datastreamdata():
            if obj_dir.name not in self.datastreams:
                self.datastreams[obj_dir.name] = []
            self.datastreams[obj_dir.name].append(dsdata)

Collect metadata from all object directories below root_dir.

Args

root_dir : Path
Directory containing object folders.

Raises

ValueError
If object metadata (CSV) is missing for any object directory.
def count_datastreams(self) ‑> int
Expand source code
def count_datastreams(self) -> int:
    """
    Return the total number of datastreams in the collection.

    Returns:
        int: Number of datastreams.
    """
    return sum(len(ds) for ds in self.datastreams.values())

Return the total number of datastreams in the collection.

Returns

int
Number of datastreams.
def count_objects(self) ‑> int
Expand source code
def count_objects(self) -> int:
    """
    Return the number of objects in the collection.

    Returns:
        int: Number of objects.
    """
    return len(self.objects)

Return the number of objects in the collection.

Returns

int
Number of objects.
def distribute_to_objects(self, root_dir: pathlib._local.Path) ‑> tuple[int, int]
Expand source code
def distribute_to_objects(self, root_dir: Path) -> tuple[int, int]:
    """
    Distribute aggregated metadata to individual object directories.

    Updates object.csv and datastreams.csv files in each object directory.

    Args:
        root_dir (Path): Directory containing object folders.

    Returns:
        tuple[int, int]: Number of updated objects and datastreams.

    Raises:
        UserWarning: If an object directory does not exist.
    """
    updated_objects_counter = 0
    updated_datastreams_counter = 0
    for obj_id, obj_data in self.objects.items():
        obj_dir = root_dir / obj_id
        if obj_dir.is_dir():
            obj_mgr = ObjectCSVManager(obj_dir, ignore_existing_csv_files=True)
            obj_mgr.set_object(obj_data, replace=True)
            updated_objects_counter += 1
            for dsdata in self.datastreams.get(obj_id, []):
                obj_mgr.add_datastream(dsdata, replace=True)
                updated_datastreams_counter += 1
            obj_mgr.save()
        else:
            raise UserWarning(
                f"Object directory {obj_dir} does not exist. Skipping."
            )
    return updated_objects_counter, updated_datastreams_counter

Distribute aggregated metadata to individual object directories.

Updates object.csv and datastreams.csv files in each object directory.

Args

root_dir : Path
Directory containing object folders.

Returns

tuple[int, int]
Number of updated objects and datastreams.

Raises

UserWarning
If an object directory does not exist.
def load_from_csv(self,
obj_file: pathlib._local.Path | None = None,
ds_file: pathlib._local.Path | None = None) ‑> None
Expand source code
def load_from_csv(
    self, obj_file: Path | None = None, ds_file: Path | None = None
) -> None:
    """
    Load object and datastream metadata from two CSV files.

    Args:
        obj_file (Path | None): Path for object metadata CSV. Defaults to 'all_objects.csv'.
        ds_file (Path | None): Path for datastream metadata CSV. Defaults to 'all_datastreams.csv'.

    Raises:
        FileNotFoundError: If either CSV file does not exist.
    """
    obj_file = obj_file or Path(ALL_OBJECTS_CSV)
    ds_file = ds_file or Path(ALL_DATASTREAMS_CSV)
    if not obj_file.is_file():
        raise FileNotFoundError(f"Required csv file {obj_file} does not exist.")
    if not ds_file.is_file():
        raise FileNotFoundError(f"Required csv file {ds_file} does not exist.")
    self.objects.clear()
    self.datastreams.clear()

    with obj_file.open("r", encoding="utf-8", newline="") as f:
        reader = csv.DictReader(f)
        for row in reader:
            obj_data = ObjectData(**row)
            self.objects[obj_data.recid] = obj_data

    with ds_file.open("r", encoding="utf-8", newline="") as f:
        reader = csv.DictReader(f)
        for row in reader:
            ds_data = DSData(**row)
            obj_id = ds_data.dspath.split("/")[0]  # Extract object id from dspath
            if obj_id not in self.datastreams:
                self.datastreams[obj_id] = []
            self.datastreams[obj_id].append(ds_data)

Load object and datastream metadata from two CSV files.

Args

obj_file : Path | None
Path for object metadata CSV. Defaults to 'all_objects.csv'.
ds_file : Path | None
Path for datastream metadata CSV. Defaults to 'all_datastreams.csv'.

Raises

FileNotFoundError
If either CSV file does not exist.
def load_from_xlsx(self, xlsx_file: pathlib._local.Path | None = None) ‑> None
Expand source code
def load_from_xlsx(self, xlsx_file: Path | None = None) -> None:
    """
    Load object and datastream metadata from a single XLSX file with two sheets.

    Args:
        xlsx_file (Path | None): Path for XLSX file. Defaults to 'all_objects.xlsx'.

    Raises:
        FileNotFoundError: If the XLSX file does not exist.
    """
    xlsx_file = xlsx_file or Path(ALL_OBJECTS_XLSX)

    if not xlsx_file.is_file():
        raise FileNotFoundError(f"File {xlsx_file} does not exist.")
    with tempfile.TemporaryDirectory() as tmpdir:
        obj_file = Path(tmpdir) / ALL_OBJECTS_CSV
        ds_file = Path(tempfile.tempdir) / ALL_DATASTREAMS_CSV
        xlsx.xlsx_to_csv(xlsx_file, obj_file, ds_file)
        self.load_from_csv(obj_file, ds_file)

Load object and datastream metadata from a single XLSX file with two sheets.

Args

xlsx_file : Path | None
Path for XLSX file. Defaults to 'all_objects.xlsx'.

Raises

FileNotFoundError
If the XLSX file does not exist.
def save_to_csv(self,
obj_file: pathlib._local.Path | None = None,
ds_file: pathlib._local.Path | None = None) ‑> None
Expand source code
def save_to_csv(
    self, obj_file: Path | None = None, ds_file: Path | None = None
) -> None:
    """
    Save object and datastream metadata to two CSV files.

    Args:
        obj_file (Path | None): Path for object metadata CSV. Defaults to 'all_objects.csv'.
        ds_file (Path | None): Path for datastream metadata CSV. Defaults to 'all_datastreams.csv'.
    """
    obj_file = obj_file or Path(ALL_OBJECTS_CSV)
    ds_file = ds_file or Path(ALL_DATASTREAMS_CSV)
    with obj_file.open("w", encoding="utf-8", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=ObjectData.fieldnames())
        writer.writeheader()
        for obj in self.objects.values():
            writer.writerow(asdict(obj))
    with ds_file.open("w", encoding="utf-8", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=DSData.fieldnames())
        writer.writeheader()
        for datastreams in self.datastreams.values():
            for dsdata in datastreams:
                writer.writerow(asdict(dsdata))

Save object and datastream metadata to two CSV files.

Args

obj_file : Path | None
Path for object metadata CSV. Defaults to 'all_objects.csv'.
ds_file : Path | None
Path for datastream metadata CSV. Defaults to 'all_datastreams.csv'.
def save_to_xlsx(self, xlsx_file: pathlib._local.Path | None = None) ‑> None
Expand source code
def save_to_xlsx(self, xlsx_file: Path | None = None) -> None:
    """
    Save object and datastream metadata to a single XLSX file with two sheets.

    Args:
        xlsx_file (Path | None): Path for XLSX file. Defaults to 'all_objects.xlsx'.
    """
    xlsx_file = xlsx_file or Path(ALL_OBJECTS_XLSX)
    with tempfile.TemporaryDirectory() as tmpdir:
        obj_file = Path(tmpdir) / ALL_OBJECTS_CSV
        ds_file = Path(tmpdir) / ALL_DATASTREAMS_CSV
        self.save_to_csv(obj_file, ds_file)
        xlsx.csv_to_xlsx(obj_file, ds_file, xlsx_file)

Save object and datastream metadata to a single XLSX file with two sheets.

Args

xlsx_file : Path | None
Path for XLSX file. Defaults to 'all_objects.xlsx'.
class ObjectData (recid: str,
title: str = '',
project: str = '',
description: str = '',
creator: str = '',
rights: str = '',
publisher: str = '',
source: str = '',
objectType: str = '',
mainResource: str = '',
funder: str = '')
Expand source code
@dataclass
class ObjectData:
    """
    Represents CSV metadata for a single GAMS object.

    Fields:

      - recid (str): Object identifier.
      - title (str): Title of the object.
      - project (str): Project name or identifier.
      - description (str): Description of the object.
      - creator (str): Creator of the object.
      - rights (str): Rights statement for the object.
      - publisher (str): Publisher of the object.
      - source (str): Source of the object.
      - objectType (str): Type of the object.
      - mainResource (str): Main datastream identifier.
      - funder (str): Funder information.
    """

    recid: str
    title: str = ""
    project: str = ""
    description: str = ""
    creator: str = ""
    rights: str = ""
    publisher: str = ""
    source: str = ""
    objectType: str = ""
    mainResource: str = ""  # main datastream
    funder: str = ""

    @classmethod
    def fieldnames(cls) -> list[str]:
        """
        Return the list of field names for ObjectData.

        Returns:
            list[str]: Names of all fields in the ObjectData dataclass.
        """
        return [field.name for field in dataclasses.fields(cls)]

    def merge(self, other: "ObjectData"):
        """
        Merge the object data with another ObjectData instance.

        Overwrites fields with non-empty values from the other instance.
        Both objects must have the same recid.

        Args:
            other (ObjectData): Another ObjectData instance to merge from.

        Raises:
            ValueError: If recid values do not match.
        """
        if self.recid != other.recid:
            raise ValueError("Cannot merge objects with different recid values")
        # These are the fields which are possibly set automatically set in the new object data
        fields_to_merge = [
            "title",
            "project",
            "creator",
            "rights",
            "publisher",
            "source",
            "objectType",
            "mainResource",
            "funder",
        ]
        for field in fields_to_merge:
            if getattr(other, field).strip():
                setattr(self, field, getattr(other, field))

    def validate(self):
        """
        Validate required metadata fields.

        Raises:
            ValueError: If any required field is empty.
        """
        if not self.recid:
            raise ValueError("recid must not be empty")
        if not self.title:
            raise ValueError(f"{self.recid}: title must not be empty")
        if not self.rights:
            raise ValueError(f"{self.recid}: rights must not be empty")
        if not self.source:
            raise ValueError(f"{self.recid}: source must not be empty")
        if not self.objectType:
            raise ValueError(f"{self.recid}: objectType must not be empty")

Represents CSV metadata for a single GAMS object.

Fields

  • recid (str): Object identifier.
  • title (str): Title of the object.
  • project (str): Project name or identifier.
  • description (str): Description of the object.
  • creator (str): Creator of the object.
  • rights (str): Rights statement for the object.
  • publisher (str): Publisher of the object.
  • source (str): Source of the object.
  • objectType (str): Type of the object.
  • mainResource (str): Main datastream identifier.
  • funder (str): Funder information.

Static methods

def fieldnames() ‑> list[str]

Return the list of field names for ObjectData.

Returns

list[str]
Names of all fields in the ObjectData dataclass.

Instance variables

var creator : str

The type of the None singleton.

var description : str

The type of the None singleton.

var funder : str

The type of the None singleton.

var mainResource : str

The type of the None singleton.

var objectType : str

The type of the None singleton.

var project : str

The type of the None singleton.

var publisher : str

The type of the None singleton.

var recid : str

The type of the None singleton.

var rights : str

The type of the None singleton.

var source : str

The type of the None singleton.

var title : str

The type of the None singleton.

Methods

def merge(self,
other: ObjectData)
Expand source code
def merge(self, other: "ObjectData"):
    """
    Merge the object data with another ObjectData instance.

    Overwrites fields with non-empty values from the other instance.
    Both objects must have the same recid.

    Args:
        other (ObjectData): Another ObjectData instance to merge from.

    Raises:
        ValueError: If recid values do not match.
    """
    if self.recid != other.recid:
        raise ValueError("Cannot merge objects with different recid values")
    # These are the fields which are possibly set automatically set in the new object data
    fields_to_merge = [
        "title",
        "project",
        "creator",
        "rights",
        "publisher",
        "source",
        "objectType",
        "mainResource",
        "funder",
    ]
    for field in fields_to_merge:
        if getattr(other, field).strip():
            setattr(self, field, getattr(other, field))

Merge the object data with another ObjectData instance.

Overwrites fields with non-empty values from the other instance. Both objects must have the same recid.

Args

other : ObjectData
Another ObjectData instance to merge from.

Raises

ValueError
If recid values do not match.
def validate(self)
Expand source code
def validate(self):
    """
    Validate required metadata fields.

    Raises:
        ValueError: If any required field is empty.
    """
    if not self.recid:
        raise ValueError("recid must not be empty")
    if not self.title:
        raise ValueError(f"{self.recid}: title must not be empty")
    if not self.rights:
        raise ValueError(f"{self.recid}: rights must not be empty")
    if not self.source:
        raise ValueError(f"{self.recid}: source must not be empty")
    if not self.objectType:
        raise ValueError(f"{self.recid}: objectType must not be empty")

Validate required metadata fields.

Raises

ValueError
If any required field is empty.