Module gamslib.sip.validation.manifests

Validation functions for manifest files in Bagit directories for GAMS projects.

This module provides functions to validate the manifest-md5.txt and manifest-sha512.txt files in a Bagit directory, ensuring that all payload files are listed and checksums are correct.

Features

  • Validates manifest-md5.txt and manifest-sha512.txt files.
  • Checks that all files in the data directory are listed in the manifests.
  • Verifies that checksums match the actual file contents.
  • Raises BagValidationError for any validation failures.

Usage

Call validate_manifest_md5()(bag_dir) to validate the MD5 manifest. Call validate_manifest_sha512()(bag_dir) to validate the SHA512 manifest.

Functions

def validate_manifest_md5(bag_dir: pathlib._local.Path) ‑> None
Expand source code
def validate_manifest_md5(bag_dir: Path) -> None:
    """
    Validate the manifest-md5.txt file in a Bagit directory.

    Args:
        bag_dir (Path): Path to the Bagit directory.

    Raises:
        BagValidationError: If the manifest-md5.txt file is missing, empty, contains invalid lines,
            has checksum mismatches, or does not list all payload files.

    Notes:
        - Checks that all files in the data directory are listed in the manifest.
        - Verifies that each listed file's MD5 checksum matches the manifest entry.
    """
    manifest_md5_file = bag_dir / "manifest-md5.txt"

    with open(manifest_md5_file, "r", encoding="utf-8", newline="") as f:
        lines = [line for line in f if line.strip()]
        if not lines:
            raise BagValidationError(f"{bag_dir}: manifest-md5.txt is empty")
        for i, line in enumerate(lines, start=1):
            try:
                checksum, file_path = line.split(" ", 1)
                file_path = file_path.strip()
                if not file_path.startswith("data/"):
                    raise BagValidationError(
                        f"Invalid path in line {i} of manifest-md5.txt: '{file_path}'"
                    )
                md5sum = hashlib.md5((bag_dir / file_path).read_bytes()).hexdigest()
                if checksum != md5sum:
                    raise BagValidationError(
                        f"Checksum mismatch in line {i} of manifest-md5.txt: '{file_path}'"
                    )
            except ValueError as e:
                raise BagValidationError(
                    f"Invalid line {i} in manifest-md5.txt: '{line.rstrip()}'"
                ) from e
    payload_files = [Path(line.split(" ", 1)[1].strip()) for line in lines]
    for file in (bag_dir / "data").rglob("*"):
        if file.is_file():
            file_path = file.relative_to(bag_dir)
            if file_path not in payload_files:
                raise BagValidationError(
                    f"File '{file_path}' is not listed in manifest-md5.txt"
                )

Validate the manifest-md5.txt file in a Bagit directory.

Args

bag_dir : Path
Path to the Bagit directory.

Raises

BagValidationError
If the manifest-md5.txt file is missing, empty, contains invalid lines, has checksum mismatches, or does not list all payload files.

Notes

  • Checks that all files in the data directory are listed in the manifest.
  • Verifies that each listed file's MD5 checksum matches the manifest entry.
def validate_manifest_sha512(bag_dir: pathlib._local.Path) ‑> None
Expand source code
def validate_manifest_sha512(bag_dir: Path) -> None:
    """
    Validate the manifest-sha512.txt file in a Bagit directory.

    Args:
        bag_dir (Path): Path to the Bagit directory.

    Raises:
        BagValidationError: If the manifest-sha512.txt file is missing, empty, contains invalid lines,
            has checksum mismatches, or does not list all payload files.

    Notes:
        - Checks that all files in the data directory are listed in the manifest.
        - Verifies that each listed file's SHA512 checksum matches the manifest entry.
    """
    manifest_sha512_file = bag_dir / "manifest-sha512.txt"
    with open(manifest_sha512_file, "r", encoding="utf-8", newline="") as f:
        lines = [line for line in f if line.strip()]
        if not lines:
            raise BagValidationError(f"{bag_dir}: manifest-sha512.txt is empty")
        for i, line in enumerate(lines, start=1):
            try:
                checksum, file_path = line.split(" ", 1)
                file_path = file_path.strip()
                if not file_path.startswith("data/"):
                    raise BagValidationError(
                        f"{bag_dir}: Invalid path in line {i} of manifest-sha512.txt: '{file_path}'"
                    )
                sha512sum = hashlib.sha512(
                    (bag_dir / file_path).read_bytes()
                ).hexdigest()
                if checksum != sha512sum:
                    raise BagValidationError(
                        f"Checksum mismatch in line {i} of manifest-sha512.txt: '{file_path}'"
                    )
            except ValueError as e:
                raise BagValidationError(
                    f"{bag_dir}:Invalid line {i} in manifest-sha512.txt: '{line.rstrip()}'"
                ) from e

    payload_files = [Path(line.split(" ", 1)[1].strip()) for line in lines]
    for file in (bag_dir / "data").rglob("*"):
        if file.is_file():
            file_path = file.relative_to(bag_dir)
            if file_path not in payload_files:
                raise BagValidationError(
                    f"File '{file_path}' is not listed in manifest-sha512.txt"
                )

Validate the manifest-sha512.txt file in a Bagit directory.

Args

bag_dir : Path
Path to the Bagit directory.

Raises

BagValidationError
If the manifest-sha512.txt file is missing, empty, contains invalid lines, has checksum mismatches, or does not list all payload files.

Notes

  • Checks that all files in the data directory are listed in the manifest.
  • Verifies that each listed file's SHA512 checksum matches the manifest entry.