# Architecture Documentation

<!-- markdownlint-disable MD013 -->

This document provides a comprehensive overview of the maybankforme application architecture, including system design, data flow, and component interactions.

## Table of Contents

- [System Overview](#system-overview)
- [Component Architecture](#component-architecture)
- [Data Flow](#data-flow)
- [Processing Pipeline](#processing-pipeline)
- [Logging Architecture](#logging-architecture)
- [Deployment Architecture](#deployment-architecture)
- [Technology Stack](#technology-stack)

## System Overview

The maybankforme application is a FastAPI-based web service that processes encrypted PDF credit card statements and converts them to structured CSV files.

### High-Level Architecture

```mermaid
graph TB
    subgraph "Client Applications"
        A1[Web Browser]
        A2[CLI Tool]
        A3[API Client]
    end
    
    subgraph "API Gateway"
        B[FastAPI Application]
        B1[Swagger UI]
        B2[REST Endpoints]
    end
    
    subgraph "Processing Services"
        C1[PDF Processor]
        C2[Text Parser]
        C3[CSV Generator]
    end
    
    subgraph "Cross-Cutting Concerns"
        D1[Logging - Structlog]
        D2[Error Handling]
        D3[Input Validation]
    end
    
    subgraph "Data Storage"
        E1[Temporary Files]
        E2[In-Memory Processing]
    end
    
    A1 --> B
    A2 --> B
    A3 --> B
    B --> B1
    B --> B2
    B2 --> C1
    C1 --> C2
    C2 --> C3
    
    C1 -.-> D1
    C2 -.-> D1
    C3 -.-> D1
    
    C1 --> E1
    C2 --> E2
    
    style B fill:#e3f2fd
    style C1 fill:#fff9c4
    style C2 fill:#fff9c4
    style C3 fill:#c8e6c9
    style D1 fill:#ffccbc
```

## Component Architecture

### Layered Architecture

```mermaid
graph TB
    subgraph "Presentation Layer"
        P1[FastAPI Routes]
        P2[Request Handlers]
        P3[Response Formatters]
    end
    
    subgraph "Business Logic Layer"
        B1[Transaction Processor]
        B2[Date Handler]
        B3[File Validator]
    end
    
    subgraph "Service Layer"
        S1[PDF Converter]
        S2[Text Extractor]
        S3[CSV Generator]
    end
    
    subgraph "Infrastructure Layer"
        I1[Logger Factory]
        I2[File System]
        I3[Temp Storage]
    end
    
    P1 --> B1
    P2 --> B1
    B1 --> S1
    B1 --> S2
    B1 --> S3
    B2 --> S2
    B3 --> P2
    
    S1 --> I2
    S2 --> I3
    S3 --> I2
    
    B1 -.-> I1
    S1 -.-> I1
    S2 -.-> I1
    
    style P1 fill:#e3f2fd
    style B1 fill:#c8e6c9
    style S1 fill:#fff9c4
    style I1 fill:#ffccbc
```

### Module Dependencies

```mermaid
graph LR
    A[api.py] --> B[process_transaction.py]
    A --> C[common/utils.py]
    A --> D[common/pdf_convert_txt.py]
    A --> E[common/txt_convert_csv.py]
    
    B --> C
    B --> D
    B --> E
    
    D --> C
    E --> C
    
    F[main.py] --> B
    F --> C
    
    style A fill:#e3f2fd
    style B fill:#c8e6c9
    style C fill:#ffccbc
    style D fill:#fff9c4
    style E fill:#fff9c4
    style F fill:#e1bee7
```

## Data Flow

### Request Processing Flow

```mermaid
sequenceDiagram
    participant Client
    participant API as FastAPI App
    participant Validator as Input Validator
    participant PDFProc as PDF Processor
    participant TxtProc as Text Processor
    participant DateProc as Date Processor
    participant CSVGen as CSV Generator
    participant Logger as Structlog
    
    Client->>API: POST /process (files + password)
    API->>Logger: Log request received
    
    API->>Validator: Validate files
    Validator->>Validator: Check file types
    Validator->>Validator: Check file sizes
    Validator-->>API: Validation result
    
    loop For each PDF file
        API->>PDFProc: Convert PDF to text
        PDFProc->>Logger: Log conversion start
        PDFProc->>PDFProc: Decrypt if needed
        PDFProc->>PDFProc: Extract text
        PDFProc->>Logger: Log conversion complete
        PDFProc-->>API: Text content
        
        API->>TxtProc: Extract transactions
        TxtProc->>Logger: Log extraction start
        TxtProc->>TxtProc: Match transaction patterns
        TxtProc->>TxtProc: Filter credits/charges
        TxtProc->>Logger: Log extraction stats
        TxtProc-->>API: Transaction data
    end
    
    API->>DateProc: Process dates
    DateProc->>Logger: Log date processing
    DateProc->>DateProc: Handle year boundaries
    DateProc->>DateProc: Add year to dates
    DateProc-->>API: Dated transactions
    
    API->>CSVGen: Generate CSV
    CSVGen->>CSVGen: Sort by date
    CSVGen->>CSVGen: Add header
    CSVGen->>Logger: Log CSV generation
    CSVGen-->>API: CSV content
    
    API->>Logger: Log request complete
    API-->>Client: Return CSV file
```

### Data Transformation Pipeline

```mermaid
graph LR
    A[PDF File] -->|Decrypt| B[Encrypted PDF]
    B -->|Extract Text| C[Raw Text]
    C -->|Parse Patterns| D[Transaction Lines]
    D -->|Filter| E[Valid Transactions]
    E -->|Add Year| F[Dated Transactions]
    F -->|Sort| G[Ordered Data]
    G -->|Format| H[CSV Output]
    
    style A fill:#e3f2fd
    style B fill:#e3f2fd
    style C fill:#fff9c4
    style D fill:#fff9c4
    style E fill:#fff9c4
    style F fill:#c8e6c9
    style G fill:#c8e6c9
    style H fill:#c8e6c9
```

### State Machine for Transaction Processing

```mermaid
stateDiagram-v2
    [*] --> Uploaded: File received
    Uploaded --> Validating: Start validation
    Validating --> Invalid: Validation failed
    Validating --> Converting: Validation passed
    Invalid --> [*]: Return error
    
    Converting --> Extracting: PDF to text complete
    Converting --> Failed: Conversion error
    
    Extracting --> Processing: Transactions extracted
    Extracting --> Failed: Extraction error
    
    Processing --> Sorting: Dates added
    Processing --> Failed: Processing error
    
    Sorting --> Complete: CSV generated
    Complete --> [*]: Return CSV
    
    Failed --> [*]: Return error
```

## Processing Pipeline

### PDF to CSV Conversion Pipeline

```mermaid
flowchart TB
    Start([Start]) --> Upload[Upload PDF Files]
    Upload --> Validate{Validate Files}
    
    Validate -->|Invalid| Error1[Return 400 Error]
    Validate -->|Valid| Loop{More Files?}
    
    Loop -->|Yes| SaveTemp[Save to Temp Storage]
    SaveTemp --> Decrypt{Password Protected?}
    
    Decrypt -->|Yes| DecryptPDF[Decrypt PDF]
    Decrypt -->|No| ExtractText[Extract Text]
    DecryptPDF --> ExtractText
    
    ExtractText --> ParseTrans[Parse Transactions]
    ParseTrans --> FilterData[Filter Credits/Charges]
    FilterData --> StoreData[Store in Dataset]
    StoreData --> Loop
    
    Loop -->|No| ProcessDates[Process Dates & Years]
    ProcessDates --> SortData[Sort by Date]
    SortData --> GenerateCSV[Generate CSV]
    GenerateCSV --> Return([Return CSV to Client])
    
    Error1 --> End([End])
    Return --> End
    
    style Upload fill:#e3f2fd
    style ExtractText fill:#fff9c4
    style ParseTrans fill:#fff9c4
    style ProcessDates fill:#c8e6c9
    style GenerateCSV fill:#c8e6c9
```

### Transaction Pattern Matching

```mermaid
flowchart LR
    A[Text Line] --> B{Match Pattern?}
    B -->|No| Z[Skip Line]
    B -->|Yes| C{Contains 'CR'?}
    C -->|Yes| D[Mark as Credit]
    C -->|No| E{Contains 'RTL MGMT CHRG'?}
    E -->|Yes| F[Mark as Charge]
    E -->|No| G[Valid Transaction]
    
    D --> H[Filter Out]
    F --> H
    G --> I[Add to CSV]
    
    style A fill:#e3f2fd
    style G fill:#c8e6c9
    style H fill:#ffccbc
    style I fill:#c8e6c9
```

## Logging Architecture

### Logging Component Structure

```mermaid
graph TB
    subgraph "Application Layer"
        A1[API Module]
        A2[Processing Module]
        A3[Conversion Modules]
    end
    
    subgraph "Logging Layer"
        L1[get_logger Factory]
        L2[configure_logging]
    end
    
    subgraph "Structlog Core"
        S1[Processor Chain]
        S2[Contextvars]
        S3[Filter by Level]
        S4[Add Metadata]
    end
    
    subgraph "Renderers"
        R1{Environment Check}
        R2[JSON Renderer]
        R3[Console Renderer]
    end
    
    subgraph "Output"
        O1[stdout]
    end
    
    A1 --> L1
    A2 --> L1
    A3 --> L1
    
    L1 --> S1
    L2 --> S1
    
    S1 --> S2
    S2 --> S3
    S3 --> S4
    S4 --> R1
    
    R1 -->|Container| R2
    R1 -->|Development| R3
    
    R2 --> O1
    R3 --> O1
    
    style L1 fill:#ffccbc
    style S1 fill:#fff9c4
    style R1 fill:#c8e6c9
    style O1 fill:#e3f2fd
```

### Log Processing Pipeline

```mermaid
sequenceDiagram
    participant App as Application
    participant Logger as Logger Instance
    participant Ctx as Contextvars
    participant Filter as Level Filter
    participant Meta as Metadata Enricher
    participant Render as Renderer
    participant Out as stdout
    
    App->>Logger: log.info("event", key=value)
    Logger->>Ctx: Merge context variables
    Ctx->>Filter: Check log level
    
    alt Level >= LOG_LEVEL
        Filter->>Meta: Add metadata
        Meta->>Meta: Add timestamp
        Meta->>Meta: Add logger name
        Meta->>Meta: Add func/line/module
        Meta->>Render: Format output
        
        alt Container Mode
            Render->>Out: {"event": "...", "key": "value", ...}
        else Development Mode
            Render->>Out: [timestamp] [level] event key=value
        end
    else Level < LOG_LEVEL
        Filter->>Filter: Discard message
    end
```

## Deployment Architecture

### Container Deployment

```mermaid
graph TB
    subgraph "Container Runtime"
        subgraph "Docker Container"
            A[Python 3.12 Alpine]
            B[Application Code]
            C[Dependencies]
            D[Uvicorn Server]
        end
        
        E[Port 8000]
    end
    
    subgraph "External Systems"
        F[Load Balancer]
        G[Log Aggregator]
        H[Monitoring]
    end
    
    F --> E
    D --> E
    D -.logs.-> G
    D -.metrics.-> H
    
    A --> B
    A --> C
    C --> D
    B --> D
    
    style A fill:#e3f2fd
    style D fill:#c8e6c9
    style G fill:#ffccbc
```

### Kubernetes Deployment

```mermaid
graph TB
    subgraph "Kubernetes Cluster"
        subgraph "Namespace: maybankforme"
            A[Service]
            
            subgraph "Deployment"
                B1[Pod 1]
                B2[Pod 2]
                B3[Pod 3]
            end
            
            C[ConfigMap]
            D[Secret]
        end
        
        E[Ingress]
    end
    
    F[External Traffic] --> E
    E --> A
    A --> B1
    A --> B2
    A --> B3
    
    C -.config.-> B1
    C -.config.-> B2
    C -.config.-> B3
    
    D -.secrets.-> B1
    D -.secrets.-> B2
    D -.secrets.-> B3
    
    style E fill:#e3f2fd
    style A fill:#c8e6c9
    style B1 fill:#fff9c4
    style B2 fill:#fff9c4
    style B3 fill:#fff9c4
```

### Environment Configuration

```mermaid
graph LR
    subgraph "Configuration Sources"
        A[.env File]
        B[Environment Variables]
        C[Dockerfile ENV]
        D[K8s ConfigMap]
    end
    
    subgraph "Application"
        E[Runtime Config]
    end
    
    A -.development.-> E
    B -.all environments.-> E
    C -.container default.-> E
    D -.kubernetes.-> E
    
    E --> F[LOG_LEVEL]
    E --> G[LOG_FORMAT]
    E --> H[IN_CONTAINER]
    E --> I[PORT]
    
    style E fill:#c8e6c9
    style F fill:#e3f2fd
    style G fill:#e3f2fd
    style H fill:#e3f2fd
    style I fill:#e3f2fd
```

## Technology Stack

### Core Technologies

```mermaid
mindmap
  root((maybankforme))
    Web Framework
      FastAPI
      Uvicorn
      Pydantic
      Starlette
    PDF Processing
      pypdf
      cryptography
    Data Processing
      Python stdlib
      CSV module
      datetime
    Logging
      structlog
      contextvars
    Development
      pytest
      mypy
      ruff
      black
    Deployment
      Docker
      Alpine Linux
```

### Dependency Graph

```mermaid
graph TB
    A[FastAPI] --> B[Starlette]
    A --> C[Pydantic]
    D[Uvicorn] --> B
    D --> E[httptools]
    D --> F[uvloop]
    
    G[structlog] --> H[Python stdlib logging]
    
    I[pytest] --> J[pytest-cov]
    I --> K[pytest-mock]
    
    L[Application] --> A
    L --> D
    L --> G
    L --> M[pypdf]
    L --> N[cryptography]
    
    style L fill:#e3f2fd
    style A fill:#c8e6c9
    style G fill:#ffccbc
    style I fill:#fff9c4
```

### Runtime Architecture

```mermaid
graph TB
    subgraph "Python Process"
        A[Main Thread]
        
        subgraph "ASGI Server (Uvicorn)"
            B[Event Loop]
            C[Request Handlers]
        end
        
        subgraph "Application"
            D[FastAPI App]
            E[Route Handlers]
            F[Business Logic]
        end
        
        subgraph "Logging"
            G[Structlog Logger]
            H[Processors]
            I[Renderers]
        end
    end
    
    J[HTTP Requests] --> B
    B --> C
    C --> E
    E --> F
    
    D --> E
    
    F -.logs.-> G
    E -.logs.-> G
    
    G --> H
    H --> I
    I --> K[stdout]
    
    style B fill:#c8e6c9
    style D fill:#e3f2fd
    style G fill:#ffccbc
```

## Performance Considerations

### Processing Performance

```mermaid
graph TB
    A[Input: N PDF Files] --> B{Processing Strategy}
    
    B -->|Sequential| C[Process File 1]
    C --> D[Process File 2]
    D --> E[Process File N]
    E --> F[Time: O(n)]
    
    B -->|Parallel| G[Process All in Pool]
    G --> H[Time: O(n/cores)]
    
    I[CLI Mode] -.uses.-> G
    J[API Mode] -.uses.-> C
    
    style F fill:#ffccbc
    style H fill:#c8e6c9
```

### Memory Usage

```mermaid
graph LR
    A[File Upload] -->|10MB max per file| B[Memory Buffer]
    B --> C[Temp File System]
    C --> D[PDF Processing]
    D --> E[In-Memory Text]
    E --> F[Transaction List]
    F --> G[CSV Generation]
    G --> H[Stream Response]
    
    C -.cleanup.-> I[Auto-delete]
    
    style B fill:#ffccbc
    style E fill:#fff9c4
    style H fill:#c8e6c9
```

## Security Architecture

### Security Layers

```mermaid
graph TB
    subgraph "Input Security"
        A[File Type Validation]
        B[File Size Limits]
        C[Password Validation]
    end
    
    subgraph "Processing Security"
        D[Temporary File Isolation]
        E[Safe Text Extraction]
        F[Pattern Validation]
    end
    
    subgraph "Output Security"
        G[No Sensitive Logging]
        H[Clean Temp Files]
        I[Secure Response]
    end
    
    J[Client Request] --> A
    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G --> H
    H --> I
    I --> K[Client Response]
    
    style A fill:#ffccbc
    style D fill:#fff9c4
    style G fill:#ffccbc
```

### Container Security

```mermaid
graph TB
    subgraph "Container Security"
        A[Non-root User]
        B[Read-only Filesystem]
        C[Minimal Base Image]
        D[No Secrets in Image]
    end
    
    subgraph "Runtime Security"
        E[Resource Limits]
        F[Network Policies]
        G[Security Context]
    end
    
    A --> H[Security Posture]
    B --> H
    C --> H
    D --> H
    E --> H
    F --> H
    G --> H
    
    style H fill:#c8e6c9
    style A fill:#ffccbc
    style D fill:#ffccbc
```

## Monitoring and Observability

### Observability Stack

```mermaid
graph TB
    subgraph "Application"
        A[FastAPI App]
        B[Structured Logs]
        C[Health Endpoint]
    end
    
    subgraph "Collection"
        D[Log Collector]
        E[Metrics Scraper]
        F[Health Checker]
    end
    
    subgraph "Storage & Analysis"
        G[Log Aggregator]
        H[Metrics Store]
        I[Monitoring Dashboard]
    end
    
    A --> B
    A --> C
    B --> D
    C --> F
    
    D --> G
    F --> E
    E --> H
    
    G --> I
    H --> I
    
    style B fill:#ffccbc
    style G fill:#fff9c4
    style I fill:#c8e6c9
```

## Scalability

### Horizontal Scaling

```mermaid
graph TB
    LB[Load Balancer] --> I1[Instance 1]
    LB --> I2[Instance 2]
    LB --> I3[Instance 3]
    LB --> I4[Instance N]
    
    I1 --> L[Centralized Logging]
    I2 --> L
    I3 --> L
    I4 --> L
    
    style LB fill:#e3f2fd
    style I1 fill:#c8e6c9
    style I2 fill:#c8e6c9
    style I3 fill:#c8e6c9
    style I4 fill:#c8e6c9
    style L fill:#ffccbc
```

### Resource Requirements

| Component | CPU | Memory | Storage | Notes |
|-----------|-----|--------|---------|-------|
| API Server | 0.5-1 core | 512MB-1GB | Minimal | Mostly I/O bound |
| PDF Processing | 1-2 cores | 1-2GB | Temp files | CPU intensive |
| Logging | Minimal | Minimal | logs volume | Structured JSON |

## Future Enhancements

```mermaid
graph TB
    A[Current State] --> B[Planned Enhancements]
    
    B --> C[Async PDF Processing]
    B --> D[Queue-based Architecture]
    B --> E[Distributed Tracing]
    B --> F[Caching Layer]
    B --> G[Webhooks for Completion]
    
    C --> H[Better Concurrency]
    D --> H
    E --> I[Better Observability]
    F --> J[Improved Performance]
    G --> K[Integration Options]
    
    style A fill:#e3f2fd
    style H fill:#c8e6c9
    style I fill:#ffccbc
    style J fill:#c8e6c9
    style K fill:#fff9c4
```

<!-- markdownlint-enable MD013 -->
