Metadata-Version: 2.4
Name: maybankforme
Version: 1.9.1
Summary: This projects converts maybank credit card statement pdf files to a single csv file that allows to be ingestable in other workflow.
Author-email: Zharif Zakaria <z@zhrif.com>
License: MIT
Project-URL: Homepage, https://github.com/zhrif/maybankforme
Project-URL: Documentation, https://github.com/zhrif/maybankforme/tree/main#maybankforme
Project-URL: Repository, https://github.com/zhrif/maybankforme
Project-URL: Changelog, https://github.com/zhrif/maybankforme/releases
Keywords: cli,tool
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pypdf
Requires-Dist: cryptography>=3.1
Requires-Dist: fastapi>=0.110.0
Requires-Dist: uvicorn[standard]>=0.24.0
Requires-Dist: python-multipart>=0.0.18
Requires-Dist: structlog>=24.1.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: black>=23.3.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: types-setuptools; extra == "dev"
Requires-Dist: httpx>=0.24.0; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest-cov>=4.0.0; extra == "test"
Requires-Dist: pytest-mock>=3.10.0; extra == "test"
Dynamic: license-file

# maybankforme

<!-- markdownlint-disable MD013 -->

This project converts Maybank credit card statement PDF files to CSV format via a FastAPI web service and an optional CLI.

## Table of Contents

- [FastAPI Service](#fastapi-service)
- [Docker](#docker)
- [Logging](#logging)
- [Architecture](#architecture)
- [Development](#development)
- [Legacy CLI Tool](#legacy-cli-tool)
- [Contributing](#contributing)

## Overview

This is a FastAPI web service that accepts encrypted credit card statement PDF files, extracts the text, looks for specific transaction pattern lines, and returns them as a CSV file.

## FastAPI Service

### Running Locally

Choose one of these ways:

1. Direct server entrypoint (recommended)

```bash
python -m maybankforme.server
```

1. Uvicorn import

```bash
uvicorn maybankforme.api:app --host 0.0.0.0 --port 8000
```

Then access locally:

- API documentation: <http://localhost:8000/docs>
- Root endpoint: <http://localhost:8000/>
- Health check: <http://localhost:8000/health>

### API Endpoints

- `GET /` - API information
- `GET /health` - Health check
- `GET /docs` - Interactive API documentation (Swagger UI)
- `POST /process` - Upload PDF files and get CSV response

### Using the API

Upload one or more PDF files to process:

```bash
# Using curl
curl -X POST "http://localhost:8000/process" \
  -F "files=@statement1.pdf" \
  -F "files=@statement2.pdf" \
  -F "password=your-password" \
  -o transactions.csv

# Using Python requests
import requests

files = [
    ('files', open('statement1.pdf', 'rb')),
    ('files', open('statement2.pdf', 'rb'))
]
data = {'password': 'your-password'}

response = requests.post('http://localhost:8000/process', files=files, data=data)
with open('transactions.csv', 'wb') as f:
    f.write(response.content)
```

See [docs/api_example.py](docs/api_example.py) for more examples.

### Features

- ✅ Upload single or multiple PDF files
- ✅ Password-protected PDF support
- ✅ Automatic date processing (handles year boundaries)
- ✅ Returns sorted CSV with all transactions
- ✅ File size validation (10MB max per file)
- ✅ Comprehensive error handling
- ✅ Health check endpoint for monitoring
- ✅ **Structured JSON logging** for containers
- ✅ **Runtime configurable log levels**

## Logging

This application uses [structlog](https://www.structlog.org/) for structured logging with automatic JSON formatting in containers.

### Logging quick start

**Development Mode** (Human-readable console output):

```bash
export LOG_LEVEL=INFO
uvicorn maybankforme.api:app --reload
```

**Production/Container Mode** (JSON output):

```bash
export LOG_FORMAT=json
export LOG_LEVEL=INFO
uvicorn maybankforme.api:app
```

### Configuration

Control logging behavior with environment variables:

| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
| `LOG_LEVEL` | DEBUG, INFO, WARNING, ERROR | INFO | Log verbosity level |
| `LOG_FORMAT` | json, console | Auto-detect | Output format |
| `IN_CONTAINER` | true, false | Auto-detect | Force container mode |

### Example Output

**JSON Format (Containers):**

```json
{
  "event": "processing_started",
  "file_count": 3,
  "logger": "maybankforme.api",
  "level": "info",
  "timestamp": "2025-10-25T20:30:45.123Z",
  "func_name": "process_statements"
}
```

**Console Format (Development):**

```text
2025-10-25T20:30:45.123Z [info] processing_started [maybankforme.api] file_count=3
```

📖 **[Complete Logging Documentation](docs/LOGGING.md)** - Detailed guide with diagrams, best practices, and troubleshooting.

## Architecture

### Processing Pipeline

```mermaid
graph LR
    A[PDF Files] --> B[Upload API]
    B --> C[PDF to Text]
    C --> D[Extract Transactions]
    D --> E[Add Year Info]
    E --> F[Sort by Date]
    F --> G[Generate CSV]
    G --> H[Return to Client]
    
    style B fill:#e3f2fd
    style C fill:#fff9c4
    style D fill:#fff9c4
    style E fill:#c8e6c9
    style F fill:#c8e6c9
    style G fill:#c8e6c9
    style H fill:#e3f2fd
```

### System Architecture

```mermaid
graph TB
    subgraph "Client Layer"
        A[Web Browser / API Client]
    end
    
    subgraph "API Layer"
        B[FastAPI Application]
        B1[POST /process]
        B2[GET /health]
        B --> B1
        B --> B2
    end
    
    subgraph "Processing Layer"
        C[PDF to Text Converter]
        D[Transaction Extractor]
        E[Date Processor]
        F[CSV Generator]
    end
    
    subgraph "Logging Layer"
        G[Structlog]
        G1[JSON Renderer]
        G2[Console Renderer]
        G --> G1
        G --> G2
    end
    
    A --> B
    B1 --> C
    C --> D
    D --> E
    E --> F
    F --> A
    
    B -.logs.-> G
    C -.logs.-> G
    D -.logs.-> G
    E -.logs.-> G
    
    style B fill:#e3f2fd
    style C fill:#fff9c4
    style D fill:#fff9c4
    style E fill:#c8e6c9
    style F fill:#c8e6c9
    style G fill:#ffccbc
```

### Data Flow

```mermaid
sequenceDiagram
    participant Client
    participant API
    participant PDFConverter
    participant TxtExtractor
    participant DateProcessor
    participant Logger
    
    Client->>API: POST /process (PDF files + password)
    API->>Logger: Log request received
    API->>PDFConverter: Convert PDF to text
    PDFConverter->>Logger: Log conversion progress
    PDFConverter->>TxtExtractor: Extract transactions
    TxtExtractor->>Logger: Log extraction stats
    TxtExtractor->>DateProcessor: Add year information
    DateProcessor->>Logger: Log date processing
    DateProcessor->>API: Return processed data
    API->>Logger: Log completion
    API->>Client: Return CSV file
```

## Development

### Setup

```bash
# Clone the repository
git clone https://github.com/zhrif/maybankforme.git
cd maybankforme

# Install dependencies (using uv)
uv sync --all-extras

# Run tests
uv run pytest -q

# Run API with auto-reload (dev)
uv run uvicorn maybankforme.api:app --reload --log-level debug

# Or run the packaged server entrypoint
uv run python -m maybankforme.server
```

### Code Quality

```bash
# Linting
uv run ruff check src/ tests/

# Type checking
uv run mypy src/

# Format code
uv run black src/ tests/

# Run all checks
uv run ruff check src/ tests/ && uv run mypy src/ && uv run pytest -q
```

### Project Structure

```text
maybankforme/
├── src/maybankforme/
│   ├── api.py                 # FastAPI application
│   ├── main.py                # CLI entry point
│   ├── process_transaction.py # Batch processing
│   └── common/
│       ├── utils.py           # Logging utilities
│       ├── pdf_convert_txt.py # PDF conversion
│       └── txt_convert_csv.py # CSV generation
├── tests/                     # Test suite
├── docs/                      # Documentation
│   ├── LOGGING.md            # Logging guide
│   └── api_example.py        # API examples
└── Dockerfile                # Container image
```

## Docker

### Docker quick start

```bash
# Pull and run from GitHub Container Registry
docker run -p 8000:8000 ghcr.io/zhrif/maybankforme
```

Then access the API at <http://localhost:8000/docs>

### Custom Configuration

```bash
# Run with debug logging
docker run -e LOG_LEVEL=DEBUG -p 8000:8000 ghcr.io/zhrif/maybankforme

# Run with console logging (for debugging)
docker run -e LOG_FORMAT=console -p 8000:8000 ghcr.io/zhrif/maybankforme

# Run on different port
docker run -e PORT=3000 -p 3000:3000 ghcr.io/zhrif/maybankforme
```

### Build Locally

```bash
# Build the image
docker build -t maybankforme .

# Run the container
docker run -p 8000:8000 maybankforme

# View logs (JSON format by default)
docker logs <container-id>
```

### Docker Compose

```yaml
version: '3.8'
services:
  api:
    image: ghcr.io/zhrif/maybankforme
    ports:
      - "8000:8000"
    environment:
      - LOG_LEVEL=INFO
      - LOG_FORMAT=json
    restart: unless-stopped
```

## Legacy CLI Tool

The original CLI tool is still available:

```bash
maybankforme -h
usage: maybankforme [-h] [--password PASSWORD] [--dataset_folder DATASET_FOLDER] input_folder output_file

positional arguments:
  input_folder          Folder containing pdf files
  output_file           csv file to save transactions

options:
  -h, --help            show this help message and exit
  --password PASSWORD   Password to open pdf files
  --dataset_folder DATASET_FOLDER
                        Folder containing dataset
```

```bash
maybankforme /dataset/pdf /dataset/Output.csv --password=<REDACTED> --dataset_folder /dataset
```

## Contributing

Contributions are welcome. Please review the code standards and development guidelines in `AGENTS.md`, and see `CONTRIBUTING.md` for a quick checklist.

<!-- markdownlint-enable MD013 -->
