Module gamslib.formatdetect.xmltypes
Functions and data for detecting XML file types and subtypes.
Provides utilities to identify XML formats based on MIME types and XML namespaces. Maps supported subtypes to MIME types and offers helpers for format detection.
Functions
def get_format_info(filepath: pathlib._local.Path, mime_type: str) ‑> tuple[str, enum.StrEnum | None]-
Expand source code
def get_format_info(filepath: Path, mime_type: str) -> tuple[str, StrEnum | None]: """ Get the format info for an XML file, including fixed MIME type and detected subtype. Args: filepath (Path): Path to the XML file. mime_type (str): MIME type detected by another tool. Returns: tuple[str, StrEnum | None]: (MIME type, detected subtype) for the file. Notes: - If the subtype cannot be detected, returns the original MIME type and None. - If detected, returns the mapped MIME type and subtype. """ xmltype = guess_xml_subtype(filepath) if xmltype is None: subtype = None else: subtype = xmltype mime_type = MIMETYPES.get(xmltype, mime_type) return mime_type, subtypeGet the format info for an XML file, including fixed MIME type and detected subtype.
Args
filepath:Path- Path to the XML file.
mime_type:str- MIME type detected by another tool.
Returns
tuple[str, StrEnum | None]- (MIME type, detected subtype) for the file.
Notes
- If the subtype cannot be detected, returns the original MIME type and None.
- If detected, returns the mapped MIME type and subtype.
def guess_xml_subtype(filepath: pathlib._local.Path) ‑> str-
Expand source code
def guess_xml_subtype(filepath: Path) -> str: """ Guess the XML subtype of a file by inspecting its namespaces. Iterates through the XML file and checks for known namespaces to determine the subtype. If the namespace is not recognized, a warning is issued and None is returned. Args: filepath (Path): Path to the XML file. Returns: str: SubType value if detected, otherwise None. Notes: - Useful for simple detectors or exotic formats. - Tools like FITS may also detect subtypes, but this function is for custom logic. """ for _, elem in ET.iterparse(filepath, events=["start-ns"]): namespace = elem[1] try: return NAMESPACES[namespace] except KeyError: warnings.warn( f"XML format detection failed due to unknown namespace: {namespace}" ) return NoneGuess the XML subtype of a file by inspecting its namespaces.
Iterates through the XML file and checks for known namespaces to determine the subtype. If the namespace is not recognized, a warning is issued and None is returned.
Args
filepath:Path- Path to the XML file.
Returns
str- SubType value if detected, otherwise None.
Notes
- Useful for simple detectors or exotic formats.
- Tools like FITS may also detect subtypes, but this function is for custom logic.
def is_xml_type(mimetype: str) ‑> enum.StrEnum | None-
Expand source code
def is_xml_type(mimetype: str) -> StrEnum | None: """ Check if a MIME type is recognized as an XML type. Args: mimetype (str): MIME type to check. Returns: StrEnum | None: True if the MIME type is a known XML type, otherwise None. """ return mimetype in MIMETYPES.values() or mimetype in XML_MIME_TYPESCheck if a MIME type is recognized as an XML type.
Args
mimetype:str- MIME type to check.
Returns
StrEnum | None- True if the MIME type is a known XML type, otherwise None.