Module gamslib.sip.validation.bagit
Validation functions for the structure and contents of Bagit directories in GAMS projects.
This module provides functions to validate the general structure of a Bagit directory, the contents of the bagit.txt file, and the existence of required directories and files.
Features
- Checks for required directories and files in a Bagit directory.
- Validates the format and contents of bagit.txt.
- Raises BagValidationError for any validation failures.
Usage
Call validate_structure()(bag_dir) to check the directory structure and required files.
Call validate_bagit_txt()(bag_dir) to validate the bagit.txt file.
Functions
def validate_bagit_txt(bag_dir: pathlib._local.Path) ‑> None-
Expand source code
def validate_bagit_txt(bag_dir: Path) -> None: """ Validate the bagit.txt file in a Bagit directory. Args: bag_dir (Path): Path to the Bagit directory. Raises: BagValidationError: If the bagit.txt file is missing or invalid. Notes: - Checks for exactly two lines: 'BagIt-Version: 1.0' and 'Tag-File-Character-Encoding: UTF-8'. - Raises an error if the format or values are incorrect. """ bagit_txt_file = bag_dir / "bagit.txt" if not bagit_txt_file.is_file(): raise BagValidationError( "'bagit.txt' file does not exist in the bag directory {bag_dir}" ) line_entries = [] with bagit_txt_file.open("r", encoding="utf-8", newline="") as f: for i, line in enumerate(f, start=1): try: stripped_line = line.rstrip() if stripped_line: key, value = stripped_line.split(":", 1) line_entries.append((key, value.strip())) except ValueError as e: raise BagValidationError( f"Invalid line {i} in {bag_dir / 'bagit.txt'}: '{stripped_line}'" ) from e if len(line_entries) != 2: # noqa: PLR2004 raise BagValidationError( f"{bag_dir / 'bagit.txt'} has invalid number of lines. bagit.txt is incomplete" ) if line_entries[0][0] != "BagIt-Version": raise BagValidationError( f"{bag_dir / 'bagit.txt'}: Missing line for 'BagIt-Version'" ) if line_entries[0][1] != "1.0": raise BagValidationError( f"{bag_dir / 'bagit.txt'}: Invalid value for 'BagIt-Version'. Must be '1.0'" ) if line_entries[1][0] != "Tag-File-Character-Encoding": raise BagValidationError( f"{bag_dir / 'bagit.txt'}: Missing line for 'Tag-File-Character-Encoding'" ) if line_entries[1][1] != "UTF-8": raise BagValidationError( f"{bag_dir / 'bagit.txt'}: Invalid value for 'Tag-File-Character-Encoding'. " "Must be 'UTF-8'" )Validate the bagit.txt file in a Bagit directory.
Args
bag_dir:Path- Path to the Bagit directory.
Raises
BagValidationError- If the bagit.txt file is missing or invalid.
Notes
- Checks for exactly two lines: 'BagIt-Version: 1.0' and 'Tag-File-Character-Encoding: UTF-8'.
- Raises an error if the format or values are incorrect.
def validate_structure(bag_dir: pathlib._local.Path) ‑> None-
Expand source code
def validate_structure(bag_dir: Path) -> None: """ Validate the general structure of a Bagit directory. Args: bag_dir (Path): Path to the Bagit directory to validate. Raises: BagValidationError: If a required directory or file is missing. Notes: - Checks for the existence of 'data', 'data/meta', and 'data/content' directories. - Checks for required files: bagit.txt, manifest-md5.txt, manifest-sha512.txt, data/meta/sip.json, and data/content/DC.xml. """ required_dirs = ["data", "data/meta", "data/content"] required_files = [ "bagit.txt", "manifest-md5.txt", "manifest-sha512.txt", "data/meta/sip.json", "data/content/DC.xml", ] for directory in required_dirs: if not (bag_dir / directory).is_dir(): raise BagValidationError(f"Bag directory '{directory}' does not exist") for file in required_files: if not (bag_dir / file).is_file(): raise BagValidationError( f"Bag file '{file}' does not exist for bag {bag_dir}" )Validate the general structure of a Bagit directory.
Args
bag_dir:Path- Path to the Bagit directory to validate.
Raises
BagValidationError- If a required directory or file is missing.
Notes
- Checks for the existence of 'data', 'data/meta', and 'data/content' directories.
- Checks for required files: bagit.txt, manifest-md5.txt, manifest-sha512.txt, data/meta/sip.json, and data/content/DC.xml.