Module gamslib.sip.validation.baginfo
Validation functions for the bag-info.txt file in GAMS Bagit directories.
This module provides functions to validate the contents of the bag-info.txt file and ensure that all required entries are present and correctly formatted.
Features
- Checks for required fields in bag-info.txt.
- Validates date, time, payload oxum, email, and description fields.
- Reads bag-info.txt as a list of (key, value) tuples to support duplicate keys.
- Raises BagValidationError for any validation failures.
Usage
Call validate_baginfo_text()(bag_dir) to validate the bag-info.txt file in a Bagit directory.
Individual validation functions are also available for more granular checks.
Functions
def read_baginfo_txt(baginfo_txt_file: pathlib._local.Path) ‑> list[tuple[str, str]]-
Expand source code
def read_baginfo_txt(baginfo_txt_file: Path) -> list[tuple[str, str]]: """ Read the bag-info.txt file and return a list of tuples (key, value) with its entries. Args: baginfo_txt_file (Path): Path to the bag-info.txt file. Returns: list[tuple[str, str]]: List of (key, value) tuples from the file. Notes: - Uses a list of tuples instead of a dict, because the same key can appear multiple times. - Raises BagValidationError if a line is invalid or missing a colon. """ entries = [] with baginfo_txt_file.open("r", encoding="utf-8", newline="") as f: for i, line in enumerate(f, start=1): try: stripped_line = line.rstrip() if line: key, value = [x.strip() for x in stripped_line.split(":", 1)] entries.append((key, value)) except ValueError as e: # missing colon raise BagValidationError( f"Invalid line {i} in '{baginfo_txt_file}': '{stripped_line}'" ) from e return entriesRead the bag-info.txt file and return a list of tuples (key, value) with its entries.
Args
baginfo_txt_file:Path- Path to the bag-info.txt file.
Returns
list[tuple[str, str]]- List of (key, value) tuples from the file.
Notes
- Uses a list of tuples instead of a dict, because the same key can appear multiple times.
- Raises BagValidationError if a line is invalid or missing a colon.
def validate_bagging_date(value: str) ‑> None-
Expand source code
def validate_bagging_date(value: str) -> None: """ Check if the value for 'Bagging-Date' is a valid date. Args: value (str): Value to validate. Raises: BagValidationError: If the value is not a valid date. """ try: datetime.strptime(value, "%Y-%m-%d") except ValueError as e: raise BagValidationError( f"Value for 'Bagging-Date' is not a valid date: {value}" ) from e return TrueCheck if the value for 'Bagging-Date' is a valid date.
Args
value:str- Value to validate.
Raises
BagValidationError- If the value is not a valid date.
def validate_baginfo_text(bag_dir: pathlib._local.Path) ‑> None-
Expand source code
def validate_baginfo_text(bag_dir: Path) -> None: """ Validate the bag-info.txt file in a Bagit directory. Args: bag_dir (Path): Path to the Bagit directory. Raises: BagValidationError: If the bag-info.txt file is missing or invalid. Notes: - Checks for required entries and validates their values. - Raises an error immediately if any check fails. """ baginfo_txt_file = bag_dir / "bag-info.txt" if not baginfo_txt_file.is_file(): raise BagValidationError("bag-info.txt file does not exist") # read the bag-info.txt file entries = read_baginfo_txt(baginfo_txt_file) # check if all required entries are present validate_required_baginfo_entries(entries) # check the values of the required entries for key, value in entries: if key == "Bagging-Date": validate_bagging_date(value) elif key == "Payload-Oxum": validate_payload_oxum(value, bag_dir) elif key == "Contact-Email": validate_contact_email(value) elif key == "External-Description": validate_external_description(value)Validate the bag-info.txt file in a Bagit directory.
Args
bag_dir:Path- Path to the Bagit directory.
Raises
BagValidationError- If the bag-info.txt file is missing or invalid.
Notes
- Checks for required entries and validates their values.
- Raises an error immediately if any check fails.
def validate_contact_email(value: str) ‑> None-
Expand source code
def validate_contact_email(value: str) -> None: """ Check the value for 'Contact-Email'. Args: value (str): Value to validate. Raises: BagValidationError: If the value is not a valid email address. """ pattern = r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}$" if not re.match(pattern, value): raise BagValidationError( f"Value for 'Contact-Email' is not a valid email address: {value}" )Check the value for 'Contact-Email'.
Args
value:str- Value to validate.
Raises
BagValidationError- If the value is not a valid email address.
def validate_external_description(value: str) ‑> None-
Expand source code
def validate_external_description(value: str) -> None: """ Check the value for 'External-Description'. Args: value (str): Value to validate. Raises: BagValidationError: If the value is empty. """ if not value.strip(): raise BagValidationError("Value for 'External-Description' must not be empty")Check the value for 'External-Description'.
Args
value:str- Value to validate.
Raises
BagValidationError- If the value is empty.
def validate_payload_oxum(value: str, bag_dir: pathlib._local.Path) ‑> None-
Expand source code
def validate_payload_oxum(value: str, bag_dir: Path) -> None: """ Check the value for 'Payload-Oxum'. The value must be in the format 'size.file_count'. The size must match the actual size of the payload. The file_count must match the actual number of files in the payload. Args: value (str): Value to validate. bag_dir (Path): Path to the Bagit directory. Raises: BagValidationError: If the value is not valid or does not match the payload. """ parts = value.split(".") if len(parts) != len(["size", "file_count"]): raise BagValidationError( f"Value for 'Payload-Oxum' is not valid: {value}. " "It must be in the format 'size.file_count'." ) if not all(part.isdigit() for part in parts): raise BagValidationError( f"Value for 'Payload-Oxum' is not valid: {value}. " "Both size and file_count must be integers." ) size, file_count = value.split(".") size = int(size) real_size = utils.count_bytes(bag_dir / "data") if real_size != size: raise BagValidationError( ( f"{bag_dir}: " f"Value for 'Payload-Oxum' ({size}) does not match " f"the actual payload size: {real_size}" ) ) real_file_count = utils.count_files(bag_dir / "data") if real_file_count != int(file_count): raise BagValidationError( ( f"{bag_dir}: " f"Value for 'Payload-Oxum' ({file_count}) does not match " f"the actual number of files: {real_file_count}" ) )Check the value for 'Payload-Oxum'.
The value must be in the format 'size.file_count'. The size must match the actual size of the payload. The file_count must match the actual number of files in the payload.
Args
value:str- Value to validate.
bag_dir:Path- Path to the Bagit directory.
Raises
BagValidationError- If the value is not valid or does not match the payload.
def validate_required_baginfo_entries(entries: list[tuple[str, str]]) ‑> None-
Expand source code
def validate_required_baginfo_entries(entries: list[tuple[str, str]]) -> None: """ Check if all required values are present in the bag-info.txt file. Args: entries (list[tuple[str, str]]): List of (key, value) tuples from bag-info.txt. Raises: BagValidationError: If a required entry is missing. """ required_keys = [ "Bagging-Date", "Payload-Oxum", "Contact-Email", "External-Description", ] existing_keys = [key for key, _ in entries] for key in required_keys: if key not in existing_keys: raise BagValidationError(f"Missing required entry '{key}' in bag-info.txt")Check if all required values are present in the bag-info.txt file.
Args
entries:list[tuple[str, str]]- List of (key, value) tuples from bag-info.txt.
Raises
BagValidationError- If a required entry is missing.