Module gamslib.formatdetect.minimaldetector

A detector that uses the mimetypes module to detect file formats.

This detector should be used as a last resort, if no other detector is available, because results depend highly on file extensions and data provided by the operating system.

Classes

class MinimalDetector
Expand source code
class MinimalDetector(FormatDetector):
    """
    Simple format detector using the Python mimetypes module.

    This detector uses file extensions to determine the MIME type.
    It is not very reliable and should only be used if no other detector is available.
    """

    def __init__(self):
        """
        Initialize the MinimalDetector and register additional MIME types.

        Notes:
            - Adds support for .jp2, .webp, .jsonld, .md, .xml, and .csv extensions.
        """
        mimetypes.add_type("image/jp2", ".jp2")
        mimetypes.add_type("image/webp", ".webp")
        mimetypes.add_type("application/ld+json", ".jsonld")
        mimetypes.add_type("text/markdown", ".md")
        mimetypes.add_type("application/xml", ".xml")
        mimetypes.add_type("text/csv", ".csv")
        super().__init__()

    def guess_file_type(self, filepath: Path) -> FormatInfo:
        """
        Detect the format of a file using the mimetypes module.

        Args:
            filepath (Path): Path to the file to analyze.

        Returns:
            FormatInfo: Object containing detected format information.

        Notes:
            - Uses DEFAULT_TYPE if MIME type cannot be determined.
            - Integrates with xmltypes and jsontypes for subtype detection.
        """
        mime_type, _ = mimetypes.guess_type(filepath)
        detector_name = str(self)
        subtype = None

        if mime_type is None:
            warnings.warn(
                f"Could not determine mimetype for {filepath}. Using default type."
            )
            mime_type = DEFAULT_TYPE
        elif xmltypes.is_xml_type(mime_type):
            mime_type, subtype = xmltypes.get_format_info(filepath, mime_type)
        elif jsontypes.is_json_type(mime_type):
            mime_type, subtype = jsontypes.get_format_info(filepath, mime_type)

        return FormatInfo(detector=detector_name, mimetype=mime_type, subtype=subtype)

    def __repr__(self):
        """
        Return a string representation of the MinimalDetector.

        Returns:
            str: "MinimalDetector"
        """
        return "MinimalDetector"

Simple format detector using the Python mimetypes module.

This detector uses file extensions to determine the MIME type. It is not very reliable and should only be used if no other detector is available.

Initialize the MinimalDetector and register additional MIME types.

Notes

  • Adds support for .jp2, .webp, .jsonld, .md, .xml, and .csv extensions.

Ancestors

Methods

def guess_file_type(self, filepath: pathlib._local.Path) ‑> FormatInfo
Expand source code
def guess_file_type(self, filepath: Path) -> FormatInfo:
    """
    Detect the format of a file using the mimetypes module.

    Args:
        filepath (Path): Path to the file to analyze.

    Returns:
        FormatInfo: Object containing detected format information.

    Notes:
        - Uses DEFAULT_TYPE if MIME type cannot be determined.
        - Integrates with xmltypes and jsontypes for subtype detection.
    """
    mime_type, _ = mimetypes.guess_type(filepath)
    detector_name = str(self)
    subtype = None

    if mime_type is None:
        warnings.warn(
            f"Could not determine mimetype for {filepath}. Using default type."
        )
        mime_type = DEFAULT_TYPE
    elif xmltypes.is_xml_type(mime_type):
        mime_type, subtype = xmltypes.get_format_info(filepath, mime_type)
    elif jsontypes.is_json_type(mime_type):
        mime_type, subtype = jsontypes.get_format_info(filepath, mime_type)

    return FormatInfo(detector=detector_name, mimetype=mime_type, subtype=subtype)

Detect the format of a file using the mimetypes module.

Args

filepath : Path
Path to the file to analyze.

Returns

FormatInfo
Object containing detected format information.

Notes

  • Uses DEFAULT_TYPE if MIME type cannot be determined.
  • Integrates with xmltypes and jsontypes for subtype detection.