Base (Private) Module: loaders/_chromabasepptxloader.py
- Purpose:
This module provides functionality to parse and store document data into a Chroma vector database.
- Platform:
Linux/Windows | Python 3.10+
- Developer:
J Berendt
- Email:
- Comments:
n/a
Attention
This module is not designed to be interacted with directly, only via the appropriate interface class(es).
Rather, please create an instance of a Chroma PPTX document loading object using the following class:
- class _ChromaBasePPTXLoader(dbpath: str | ChromaDB, collection: str = None, *, split_text: bool = True, load_keywords: bool = False, llm: object = None, offline: bool = False)[source]
Bases:
_ChromaBaseLoaderBase class for loading PPTX files into a Chroma vector database.
This class is a specialised version of the
_ChromaBaseLoaderclass, designed to handle PPTX presentations.- Parameters:
dbpath (str | ChromaDB) – Either the full path to the Chroma database directory, or an instance of a
ChromaDBclass. If the instance is passed, thecollectionargument is ignored.collection (str, optional) – Name of the Chroma database collection. Only required if the
dbparameter is a path. Defaults to None.split_text (bool, optional) – Split the document into chunks, before loading it into the database. Defaults to True.
load_keywords (bool, optional) – Derive keywords from the document and load these into the sister keywords collection. Defaults to False.
llm (object, optional) – If deriving keywords, this is the LLM which will do the derivation. Defaults to None.
offline (bool, optional) – Remain offline and use the locally cached embedding function model. Defaults to False.
- _create_documents() bool[source]
Convert each extracted slide into a
Documentobject.- Returns:
True of the slides are loaded as
Documentobjects successfully. Otherwise False.- Return type:
bool