Module: parsers/pdfparser.py
- Purpose:
This module serves as the public interface for interacting with PDF files and parsing their contents.
- Platform:
Linux/Windows | Python 3.10+
- Developer:
J Berendt
- Email:
- Comments:
n/a
- Example:
For example code usage, please refer to the
PDFParserclass docstring.
- class PDFParser(path: str)[source]
Bases:
_PDFTableParser,_PDFTextParserPDF document parser.
- Parameters:
path (str) – Full path to the PDF document to be parsed.
- Example:
Extract text from a PDF file:
>>> from docp import PDFParser >>> pdf = PDFParser(path='/path/to/myfile.pdf') >>> pdf.extract_text() # Access the content of page 1. >>> pg1 = pdf.doc.pages[1].content
Extract tables from a PDF file:
>>> from docp import PDFParser >>> pdf = PDFParser('/path/to/myfile.pdf') >>> pdf.extract_tables() # Access the first table on page 1. >>> tbl1 = pdf.doc.pages[1].tables[1]