Module gamslib.formatdetect.jsontypes

Module to inspect and classify JSON files.

Provides utilities to check MIME types, detect JSON lines format, and guess the subtype of JSON files. Maps supported subtypes to MIME types and offers helpers for format detection.

Functions

def get_format_info(filepath: pathlib._local.Path, mime_type: str) ‑> tuple[str, SubType | None]
Expand source code
def get_format_info(filepath: Path, mime_type: str) -> tuple[str, SubType | None]:
    """
    Return a tuple with the (possibly fixed) MIME type and detected JSON subtype.

    Args:
        filepath (Path): Path to the JSON file.
        mime_type (str): Initial MIME type.

    Returns:
        tuple[str, SubType | None]: (MIME type, detected subtype) for the file.
    """
    subtype = None
    json_type = guess_json_format(filepath)
    if json_type in MIMETYPES:
        mime_type = MIMETYPES[json_type]
        subtype = json_type
    return (mime_type,subtype)

Return a tuple with the (possibly fixed) MIME type and detected JSON subtype.

Args

filepath : Path
Path to the JSON file.
mime_type : str
Initial MIME type.

Returns

tuple[str, SubType | None]
(MIME type, detected subtype) for the file.
def guess_json_format(file_to_validate: pathlib._local.Path) ‑> SubType
Expand source code
def guess_json_format(file_to_validate: Path) -> SubType:
    """
    Guess the subtype of a JSON file.

    Args:
        file_to_validate (Path): Path to the JSON file.

    Returns:
        SubType: Detected subtype (JSON, JSONLD, JSONSCHEMA, or JSONL).

    Notes:
        - Checks file extension and content for schema or linked data context.
        - Falls back to JSONL if content is not valid JSON but is valid JSON lines.
    """
    if file_to_validate.suffix == ".jsonld":
        return SubType.JSONLD

    try:
        with open(file_to_validate, "r", encoding="utf-8", newline="") as f:
            file_content = f.read()
            jsondata = json.loads(file_content)
            if (
                "$schema" in jsondata
                and jsondata["$schema"]
                == "https://json-schema.org/draft/2020-12/schema"
            ):
                return SubType.JSONSCHEMA

            for key in jsondata:
                if key in ["@context", "@id"]:
                    return SubType.JSONLD
    # If file contains JSONL context, parsing will fail
    except json.JSONDecodeError as exp:
        if is_jsonl(file_content):
            return SubType.JSONL
        raise exp from exp  # eg. invalid JSON
    return SubType.JSON

Guess the subtype of a JSON file.

Args

file_to_validate : Path
Path to the JSON file.

Returns

SubType
Detected subtype (JSON, JSONLD, JSONSCHEMA, or JSONL).

Notes

  • Checks file extension and content for schema or linked data context.
  • Falls back to JSONL if content is not valid JSON but is valid JSON lines.
def is_json_type(mime_type: str) ‑> bool
Expand source code
def is_json_type(mime_type: str) -> bool:
    """
    Check if a MIME type is recognized as a JSON type.

    Args:
        mime_type (str): MIME type to check.

    Returns:
        bool: True if the MIME type is a known JSON type, False otherwise.
    """
    return mime_type in JSON_MIME_TYPES or mime_type in MIMETYPES.values()

Check if a MIME type is recognized as a JSON type.

Args

mime_type : str
MIME type to check.

Returns

bool
True if the MIME type is a known JSON type, False otherwise.
def is_jsonl(data: str) ‑> bool
Expand source code
def is_jsonl(data: str) -> bool:
    """
    Check if a string contains JSON lines (jsonl) format.

    Args:
        data (str): String content of the file.

    Returns:
        bool: True if the content is valid JSON lines, False otherwise.

    Notes:
        - Used primarily by the 'guess_json_format' function.
        - Returns False for empty strings.
    """
    if data.strip() == "":
        return False
    lines = data.splitlines()
    is_jsonl_ = True
    for line in lines:
        try:
            json.loads(line)
        except json.JSONDecodeError:
            is_jsonl_ = False
            break
    return is_jsonl_

Check if a string contains JSON lines (jsonl) format.

Args

data : str
String content of the file.

Returns

bool
True if the content is valid JSON lines, False otherwise.

Notes

  • Used primarily by the 'guess_json_format' function.
  • Returns False for empty strings.