Module: dbs/chroma.py
- Purpose:
This module provides a localised wrapper and specialised functionality around the
langchain_community.vectorstores.Chromaclass, for interacting with a Chroma database.- Platform:
Linux/Windows | Python 3.10+
- Developer:
J Berendt
- Email:
- Comments:
This module uses the
langchain_community.vectorstores.Chromawrapper class, rather than the basechromadblibrary as it provides theadd_textsmethod which supports GPU processing and parallelisation; which is implemented by this module’sadd_documents()method.
- class ChromaDB(*args: Any, **kwargs: Any)[source]
Bases:
ChromaWrapper class around the
chromadblibrary.- Parameters:
path (str) – Path to the chroma database’s directory.
collection (str) – Collection name.
offline (bool, optional) – Remain offline, used the cached embedding function model rather than obtaining one online. Defaults to False.
- property client
Accessor to the
chromadb.PersistentClientclass.
- property collection
Accessor to the chromadb client’s collection object.
- property embedding_function
Accessor to the embedding function used.
- property path: str
Accessor to the database’s path.
- add_documents(docs: list[langchain_core.documents.base.Document])[source]
Add multiple documents to the collection.
This method overrides the base class’
add_documentsmethod to enable local ID derivation. Knowing how the IDs are derived gives us greater understanding and querying ability of the documents in the database. Each ID is derived locally by the_preproc()method from the file’s basename, page number and page content.Additionally, this method wraps the
langchain_community.vectorstores.Chroma.add_texts()method which supports GPU processing and parallelisation.- Parameters:
docs (list) – A list of
langchain_core.documents.base.Documentdocument objects.
- show_all()[source]
Return the entire contents of the collection.
This is an alias around
.collection.get().
- _get_embedding_function_model() str[source]
Derive the path to the embedding function model.
- Note:
If
offline=Truewas passed into the class constructor, the model cache is used, if available - otherwise the user is warned.If online usage is allowed, the model is obtained by the means defined by the embedding function constructor.
- Returns:
The name of the model. Or, if offline, the path to the model’s cache to be passed into the embedding function constructor is returned.
- Return type:
str