Library API Documentation
The page contains simple library usage examples and the module-level
documentation for each of the importable modules in docp.
Use Cases
To save digging through the documentation for each module and cobbling together what a ‘standard use case’ may look like, a couple have been provided here.
Extract text from a PDF file
>>> from docp import PDFParser
>>> pdf = PDFParser(path='/path/to/myfile.pdf')
>>> pdf.extract_text()
# Access the content of page 1.
>>> pg1 = pdf.doc.pages[1].content
Extract text from a PowerPoint presentation
>>> from docp import PPTXParser
>>> pptx = PPTXParser(path='/path/to/myfile.pptx')
>>> pptx.extract_text()
# Access the text on slide 1.
>>> pg1 = pptx.doc.slides[1].content
Module Documentation
In addition to the module-level documentation, most of the public classes and/or methods come with one or more usage examples and access to the source code itself.
There are two type of modules listed here:
Those whose API is designed to be accessed by the user/caller
Those which are designated ‘private’ and designed only for internal use
We’ve exposed both here for completeness and to aid in understanding how the library is implemented.
Links to module-level documentation
- Module: dbs/chroma.py
- Module: libs/utilities.py
- Module: loaders/chromapdfloader.py
- Module: loaders/chromapptxloader.py
- Module: loaders/lutilities.py
- Module: objects/pdfobject.py
- Module: objects/pptxobject.py
- Module: parsers/pdfparser.py
- Module: parsers/pptxparser.py
- Module: parsers/putilities.py
- Base (Private) Module: loaders/_chromabaseloader.py
- Base (Private) Module: loaders/_chromabasepdfloader.py
- Base (Private) Module: loaders/_chromabasepptxloader.py
- Base (Private) Module: objects/_docbaseobject.py
- Base (Private) Module: objects/_pageobject.py
- Base (Private) Module: objects/_slideobject.py
- Base (Private) Module: objects/_textobject.py
- Base (Private) Module: parsers/_pdfbaseparser.py
- Base (Private) Module: parsers/_pdftableparser.py
- Base (Private) Module: parsers/_pdftextparser.py
- Base (Private) Module: parsers/_pptxbaseparser.py
- Base (Private) Module: parsers/_pptxtextparser.py
Last updated: 12 Feb 2025