Module gamslib.sip.validation
Validation utilities for Bagit and object directories in GAMS projects.
This subpackage provides functions to validate the structure and metadata of Bagit directories, including checks for required files, manifests, and SIP JSON metadata.
Features
- Validates Bagit directory structure and required files.
- Checks bagit.txt, bag-info.txt, and manifest files (MD5, SHA512).
- Validates SIP JSON metadata for completeness and correctness.
- Raises BagValidationError for any validation failures.
Usage
Call validate_bag()(bag_dir) to perform all standard validations on a Bagit directory.
Individual validation functions are also available for more granular checks.
Sub-modules
gamslib.sip.validation.baginfo-
Validation functions for the bag-info.txt file in GAMS Bagit directories …
gamslib.sip.validation.bagit-
Validation functions for the structure and contents of Bagit directories in GAMS projects …
gamslib.sip.validation.manifests-
Validation functions for manifest files in Bagit directories for GAMS projects …
gamslib.sip.validation.sip_json-
Validation functions for the sip.json file in GAMS Bagit directories …
Functions
def validate_bag(bag_dir: pathlib._local.Path) ‑> None-
Expand source code
def validate_bag(bag_dir: Path) -> None: """ Validate the structure and metadata of a Bagit directory. Args: bag_dir (Path): Path to the Bagit directory to validate. Raises: BagValidationError: If the bag directory does not exist or any validation check fails. Notes: - Runs all standard validation checks: structure, bagit.txt, manifests, SIP JSON, and bag-info.txt. - Raises an error immediately if any check fails. """ if not bag_dir.is_dir(): raise BagValidationError(f"Bag directory {bag_dir} does not exist") validate_structure(bag_dir) validate_bagit_txt(bag_dir) validate_manifest_md5(bag_dir) validate_manifest_sha512(bag_dir) validate_sip_json(bag_dir) validate_baginfo_text(bag_dir)Validate the structure and metadata of a Bagit directory.
Args
bag_dir:Path- Path to the Bagit directory to validate.
Raises
BagValidationError- If the bag directory does not exist or any validation check fails.
Notes
- Runs all standard validation checks: structure, bagit.txt, manifests, SIP JSON, and bag-info.txt.
- Raises an error immediately if any check fails.
def validate_datastream_id(datastream_id: str) ‑> None-
Expand source code
def validate_datastream_id(datastream_id: str) -> None: """Validate a given datastream ID. A valid datastream is must start with a letter or a number, followed by any number of ASCII letters, numbers, dots, dashes and underscores. Args: datastream_id (str): The datastream ID to validate. Raises: ValueError: If the datastream ID is invalid. The error message will indicate the reason. """ _validate_object_id(datastream_id, allow_uppercase=True)Validate a given datastream ID.
A valid datastream is must start with a letter or a number, followed by any number of ASCII letters, numbers, dots, dashes and underscores.
Args
datastream_id:str- The datastream ID to validate.
Raises
ValueError- If the datastream ID is invalid. The error message will indicate the reason.
def validate_pid(pid: str) ‑> None-
Expand source code
def validate_pid(pid: str) -> None: """Validate a given PID (Project Identifier). A valid id follows the rules of xml:id, with some modifications: - All letters must be lowercase ASCII letters. - Every id must have the project sigle as prefix, followed by a dot. The prefix must start with a letter, followed by any number of letters and numbers. - The part after the dot must start with a letter or a number, followed by any number of ASCII letters, numbers, dots, and dashes. - For legacy reasons, the project prefix can be proceeded by a type prefix like 'o:' but we discourage the use of this prefix for new objects. Only lowercase letters are allowed as type prefix. Invalid ids are for example: - .abcdef (starts with a dot) - 1abcdef (starts with a number) - abc/def (contains invalid character '/') - abc@def (contains invalid character '@') - abcdef (no dot) - abc..def (double dot) Args: pid (str): The ID to validate. allow_uppercase (bool, optional): If True, allow uppercase letters in pid. Defaults to False. Object IDs (PIDs) should normally be lowercase only, but datastream id can be uppercase too. Raises: ValueError: If the ID is invalid. The error message will indicate the reason. """ MAX_ID_LENGTH = 64 # Check if the PID is a valid URI if len(pid) > MAX_ID_LENGTH: raise ValueError(f"ID must not be longer than {MAX_ID_LENGTH} characters") type_prefix, project_prefix, object_id = _split_id(pid) _validate_type_prefix(type_prefix) validate_project_name(project_prefix) _validate_object_id(object_id) if type_prefix: warnings.warn( "Using type prefixes in PIDs is discouraged for new objects.", UserWarning )Validate a given PID (Project Identifier).
A valid id follows the rules of xml:id, with some modifications:
- All letters must be lowercase ASCII letters.
- Every id must have the project sigle as prefix, followed by a dot. The prefix must start with a letter, followed by any number of letters and numbers.
- The part after the dot must start with a letter or a number, followed by any number of ASCII letters, numbers, dots, and dashes.
- For legacy reasons, the project prefix can be proceeded by a type prefix like 'o:' but we discourage the use of this prefix for new objects. Only lowercase letters are allowed as type prefix.
Invalid ids are for example:
- .abcdef (starts with a dot) - 1abcdef (starts with a number) - abc/def (contains invalid character '/') - abc@def (contains invalid character '@') - abcdef (no dot) - abc..def (double dot)Args
pid:str- The ID to validate.
allow_uppercase:bool, optional- If True, allow uppercase letters in pid. Defaults to False. Object IDs (PIDs) should normally be lowercase only, but datastream id can be uppercase too.
Raises
ValueError- If the ID is invalid. The error message will indicate the reason.
def validate_project_name(value: str) ‑> None-
Expand source code
def validate_project_name( value: str ) -> None: """ Validate the project name. Can also be used to validate the project prefix of a PID. The value must start with a letter, followed by any number of letters and numbers. The project prefix is the part before the dot (.) in an ID like "abc.def123". Args: value (str): The project prefix to validate. Raises: ValueError: If the project prefix is invalid. """ if not value: raise ValueError("Project prefix (before dot) is empty") if not re.match(r"^[a-z][a-z0-9]*$", value): raise ValueError( "A project name must start with a letter and contain " "only lowercase letters and numbers." )Validate the project name. Can also be used to validate the project prefix of a PID.
The value must start with a letter, followed by any number of letters and numbers. The project prefix is the part before the dot (.) in an ID like "abc.def123".
Args
value:str- The project prefix to validate.
Raises
ValueError- If the project prefix is invalid.