Module gamslib.formatdetect.formatinfo

Describes the format of a file.

FormatInfo objects are returned by format detectors.

It also defines the SubType enum, which contains all supported subtypes of formats.

The subtype data is fetched from CSV files located in the resources directory of the formatdetect package.

Functions

def extract_subtype_info_from_csv() ‑> dict[str, str]
Expand source code
def extract_subtype_info_from_csv() -> dict[str, str]:
    """
    Extract data needed for the SubType enum.

    Loads subtype information from all CSV files in the resources directory.

    Returns:
        dict[str, str]: Mapping of subformat to full name for the SubType enum.
    """
    return {item["subformat"]: item["full name"] for item in load_subtypes_from_csv()}

Extract data needed for the SubType enum.

Loads subtype information from all CSV files in the resources directory.

Returns

dict[str, str]
Mapping of subformat to full name for the SubType enum.
def find_subtype_csv_files() ‑> list[pathlib._local.Path]
Expand source code
def find_subtype_csv_files() -> list[Path]:
    """
    Find all CSV files in the resources directory that contain subtype definitions.

    Returns:
        list[Path]: List of Path objects pointing to the CSV files.
    """
    resource_dir = impresources.files("gamslib") / "formatdetect" / "resources"
    return list(resource_dir.glob("*.csv"))

Find all CSV files in the resources directory that contain subtype definitions.

Returns

list[Path]
List of Path objects pointing to the CSV files.
def load_subtypes_from_csv() ‑> list[dict[str, str]]
Expand source code
def load_subtypes_from_csv() -> list[dict[str, str]]:
    """
    Load subtypes from all CSV files in the resources directory.

    Returns:
        list[dict[str, str]]: List of dictionaries, each containing keys:
            'subformat', 'full name', 'ds name', 'mimetype', and 'maintype'.
    """
    subtypes = []
    for csvfile in find_subtype_csv_files():
        maintype = csvfile.stem.split("_", 1)[0]  # Extract the main type from the filename
        with csvfile.open("r", encoding="utf-8") as f:
            reader = csv.DictReader(f)
            for row in reader:
                row["maintype"] = maintype  # Add the main type to each row
                # Strip whitespace from keys and values
                stripped_row = {k.strip(): v.strip() for k, v in row.items()}
                subtypes.append(stripped_row)
    return subtypes

Load subtypes from all CSV files in the resources directory.

Returns

list[dict[str, str]]
List of dictionaries, each containing keys: 'subformat', 'full name', 'ds name', 'mimetype', and 'maintype'.

Classes

class FormatInfo (detector: str,
mimetype: str,
subtype: SubType | None = None)
Expand source code
@dataclass
class FormatInfo:
    """
    Object containing basic information about the format of a file.

    FormatInfo objects are returned by format detectors.

    Attributes:
        detector (str): Name of the detector that detected the format.
        mimetype (str): MIME type of the file (e.g., 'text/xml').
        subtype (SubType | None): Subtype of the format, if detected.
    """

    detector: str  # name of the detector that detected the format
    mimetype: str  # eg. text/xml
    subtype: SubType | None = None  # type: ignore

    def is_xml_type(self) -> bool:  # type: ignore
        """
        Check if the subtype is an XML type.

        Returns:
            bool: True if the subtype's maintype is 'xml', False otherwise.
        """
        subtype_info = self._get_subtype_info()
        if subtype_info is not None:
            return (
                subtype_info["subformat"] == self.subtype.name
                and subtype_info["maintype"] == "xml"
            )
        return False

    def is_json_type(self) -> bool:  # type: ignore
        """
        Check if the subtype is a JSON type.

        Returns:
            bool: True if the subtype's maintype is 'json', False otherwise.
        """
        subtype_info = self._get_subtype_info()
        if subtype_info is not None:
            return (
                subtype_info["subformat"] == self.subtype.name
                and subtype_info["maintype"] == "json"
            )
        return False

    @property
    def description(self) -> str:
        """
        Return a human-friendly description of the format.

        Returns:
            str: Description based on subtype info, MIME type, or defaults.
        """
        mime_prefix_map = {
            "text/": "Text document",
            "image/": "Image document",
            "audio/": "Audio document",
            "video/": "Video document",
            "application/": "Application document",
        }
        desc = ""
        subtype_info = self._get_subtype_info()
        if subtype_info is not None:
            desc = subtype_info["ds name"]
        elif self.mimetype == "application/octet-stream":
            desc = "Binary document"
        else:
            for prefix, description in mime_prefix_map.items():
                if self.mimetype.startswith(prefix):
                    desc = description
                    break
        return desc

    def _get_subtype_info(self) -> dict[str, str] | None:
        """
        Get the full subtype information from the CSV files for this format.

        Returns:
            dict[str, str] | None: Subtype info dictionary, or None if not found.
        """
        subtype_info = None
        if self.subtype is not None:
            for subtype in load_subtypes_from_csv():
                if subtype["subformat"] == self.subtype.name:
                    subtype_info = subtype
        return subtype_info

Object containing basic information about the format of a file.

FormatInfo objects are returned by format detectors.

Attributes

detector : str
Name of the detector that detected the format.
mimetype : str
MIME type of the file (e.g., 'text/xml').
subtype : SubType | None
Subtype of the format, if detected.

Instance variables

prop description : str
Expand source code
@property
def description(self) -> str:
    """
    Return a human-friendly description of the format.

    Returns:
        str: Description based on subtype info, MIME type, or defaults.
    """
    mime_prefix_map = {
        "text/": "Text document",
        "image/": "Image document",
        "audio/": "Audio document",
        "video/": "Video document",
        "application/": "Application document",
    }
    desc = ""
    subtype_info = self._get_subtype_info()
    if subtype_info is not None:
        desc = subtype_info["ds name"]
    elif self.mimetype == "application/octet-stream":
        desc = "Binary document"
    else:
        for prefix, description in mime_prefix_map.items():
            if self.mimetype.startswith(prefix):
                desc = description
                break
    return desc

Return a human-friendly description of the format.

Returns

str
Description based on subtype info, MIME type, or defaults.
var detector : str

The type of the None singleton.

var mimetype : str

The type of the None singleton.

var subtypeSubType | None

The type of the None singleton.

Methods

def is_json_type(self) ‑> bool
Expand source code
def is_json_type(self) -> bool:  # type: ignore
    """
    Check if the subtype is a JSON type.

    Returns:
        bool: True if the subtype's maintype is 'json', False otherwise.
    """
    subtype_info = self._get_subtype_info()
    if subtype_info is not None:
        return (
            subtype_info["subformat"] == self.subtype.name
            and subtype_info["maintype"] == "json"
        )
    return False

Check if the subtype is a JSON type.

Returns

bool
True if the subtype's maintype is 'json', False otherwise.
def is_xml_type(self) ‑> bool
Expand source code
def is_xml_type(self) -> bool:  # type: ignore
    """
    Check if the subtype is an XML type.

    Returns:
        bool: True if the subtype's maintype is 'xml', False otherwise.
    """
    subtype_info = self._get_subtype_info()
    if subtype_info is not None:
        return (
            subtype_info["subformat"] == self.subtype.name
            and subtype_info["maintype"] == "xml"
        )
    return False

Check if the subtype is an XML type.

Returns

bool
True if the subtype's maintype is 'xml', False otherwise.
class SubType (*args, **kwds)

Enum where members are also (and must be) strings

Ancestors

  • enum.StrEnum
  • builtins.str
  • enum.ReprEnum
  • enum.Enum

Class variables

var ATOM

The type of the None singleton.

var Collada

The type of the None singleton.

var DCMI

The type of the None singleton.

var DataCite

The type of the None singleton.

var DocBook

The type of the None singleton.

var EAD

The type of the None singleton.

var GML

The type of the None singleton.

var JSON

The type of the None singleton.

var JSONL

The type of the None singleton.

var JSONLD

The type of the None singleton.

var JSONSCHEMA

The type of the None singleton.

var KML

The type of the None singleton.

var LIDO

The type of the None singleton.

var MARC21

The type of the None singleton.

var METS

The type of the None singleton.

var MODS

The type of the None singleton.

var MathML

The type of the None singleton.

var ODF

The type of the None singleton.

var OWL

The type of the None singleton.

var PREMIS

The type of the None singleton.

var PresentationML

The type of the None singleton.

var RDF

The type of the None singleton.

var RDFS

The type of the None singleton.

var RSS

The type of the None singleton.

var RelaxNG

The type of the None singleton.

var SMIL

The type of the None singleton.

var SOAP

The type of the None singleton.

var SVG

The type of the None singleton.

var SVG_Animation

The type of the None singleton.

var Schematron

The type of the None singleton.

var SpreadsheetML

The type of the None singleton.

var TEI

The type of the None singleton.

var VoiceXML

The type of the None singleton.

var WSDL

The type of the None singleton.

var WordprocessingML

The type of the None singleton.

var X3D

The type of the None singleton.

var XBRL

The type of the None singleton.

var XForms

The type of the None singleton.

var XHTML

The type of the None singleton.

var XHTML_RDFa

The type of the None singleton.

var XML

The type of the None singleton.

var XSD

The type of the None singleton.

var XSLT

The type of the None singleton.

The type of the None singleton.