Metadata-Version: 2.3
Name: HowdenParser
Version: 0.1.16
Summary: A simple configuration manager with Pydantic and JSON export.
License: MIT
Keywords: config,configuration,pydantic,json
Author: JesperThoftIllemannJ
Author-email: jesper.jaeger@howdendanmark.dk
Requires-Python: >=3.12,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: fitz (>=0.0.1.dev2,<0.0.2)
Requires-Dist: howdenconfig (>=0.1.13,<0.2.0)
Requires-Dist: langchain (>=0.3.27,<0.4.0)
Requires-Dist: llama-parse (>=0.6.58,<0.7.0)
Requires-Dist: mistralai (>=1.9.3,<2.0.0)
Requires-Dist: pypdf2 (>=3.0.1,<4.0.0)
Requires-Dist: transformers (>=4.55.2,<5.0.0)
Project-URL: Documentation, https://github.com/yourusername/config
Project-URL: Homepage, https://github.com/yourusername/config
Project-URL: Repository, https://github.com/yourusername/config
Description-Content-Type: text/markdown

# OCR & LLM Parser

A powerful Python package for parsing and processing documents using multiple providers:
- **Mistral OCR** — Extracts text from PDFs and images with high accuracy.
- **LangChain** — Processes or summarizes text using LLMs.
- **Llama Parser** — Advanced parsing with Markdown or text output.
- **HuggingFace** — OCR and document question answering with transformer models.

The package provides a **unified interface** so you can switch between providers easily using a **factory pattern**.

---

## 🚀 Features
- Extract text from PDFs or images
- Summarize or process text using LLMs
- Support for **Markdown** or **plain text** output
- Plug-and-play factory to switch providers without changing much code
- Handles environment variable loading for API keys automatically

---

# 🔑 Tokens

Create a .env file in your project root and add the API keys for the services you want to use.

### Mistral OCR
MISTRAL-OCR-API-TOKEN=your_mistral_api_key

### Llama Parser
LLAMA-PARSER-API-TOKEN=your_llama_parser_api_key

### HuggingFace
HF-API-TOKEN=your_huggingface_api_key

Only include the keys for the providers you plan to use.

---

# 🛠️ Usage

from HowdenParser import ParserFactory

from pathlib import Path

parser = ParserFactory.get_parser("mistralocr:", result_type="md")
text = parser.parse(Path("document.pdf"))
print(text)

if HowdenConfig package being used


parser = ParserFactory.get_parser("mistralocr:", **config.parameter.dump_model()) 

text = parser.parse(Path("document.pdf"))


