Metadata-Version: 2.4
Name: pymtcnn
Version: 1.1.0
Summary: High-performance cross-platform MTCNN face detection with CUDA and Apple Neural Engine support
Home-page: https://github.com/johnwilsoniv/pymtcnn
Author: SplitFace
License: CC BY-NC 4.0
Project-URL: Homepage, https://github.com/johnwilsoniv/pymtcnn
Project-URL: Documentation, https://github.com/johnwilsoniv/pymtcnn/blob/main/README.md
Project-URL: Repository, https://github.com/johnwilsoniv/pymtcnn
Project-URL: Bug Tracker, https://github.com/johnwilsoniv/pymtcnn/issues
Keywords: face detection,mtcnn,coreml,apple neural engine,computer vision,deep learning
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: MacOS
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Requires-Dist: opencv-python>=4.5.0
Provides-Extra: coreml
Requires-Dist: coremltools>=7.0; extra == "coreml"
Provides-Extra: onnx
Requires-Dist: onnxruntime>=1.16.0; extra == "onnx"
Provides-Extra: onnx-gpu
Requires-Dist: onnxruntime-gpu>=1.16.0; extra == "onnx-gpu"
Provides-Extra: all
Requires-Dist: coremltools>=7.0; extra == "all"
Requires-Dist: onnxruntime-gpu>=1.16.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pillow>=9.0.0; extra == "dev"
Requires-Dist: matplotlib>=3.5.0; extra == "dev"
Requires-Dist: build>=0.10.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# PyMTCNN

High-performance **cross-platform** MTCNN face detection with CUDA and Apple Neural Engine support.

## Overview

PyMTCNN is a pure Python implementation of MTCNN (Multi-task Cascaded Convolutional Networks) with multi-backend support for optimal performance across different hardware platforms. It achieves **175.7x speedup** over baseline Python implementations while maintaining 95% IoU accuracy.

### Key Features

- **Cross-Platform**: Works on Windows, Linux, and macOS
- **Multi-Backend**: Auto-selects best backend (CoreML, CUDA, or CPU)
- **High Performance**:
  - Apple Silicon (CoreML): 34.26 FPS
  - NVIDIA GPUs (CUDA): 50+ FPS
  - CPU fallback: 5-10 FPS
- **Accurate**: 95% IoU agreement with C++ OpenFace baseline
- **Easy to Use**: Simple, unified Python API
- **Hardware Accelerated**: Leverages Apple Neural Engine or NVIDIA CUDA
- **Flexible**: Single-frame or batch processing modes
- **Production Ready**: Optimized for real-time video analysis

### Performance

| Backend | Hardware | FPS | ms/frame |
|---------|----------|-----|----------|
| CoreML | Apple M1/M2/M3 | 34.26 | 29.2 |
| ONNX+CUDA | NVIDIA RTX 3090 | 50+ | <20 |
| ONNX+CPU | Intel/AMD CPU | 5-10 | 100-200 |

**Speedup**: 175.7x faster than baseline Python implementation

## Requirements

- **Python**: 3.8 or later
- **OS**: macOS, Windows, or Linux
- **Hardware** (one of):
  - Apple Silicon (M1, M2, M3) for CoreML
  - NVIDIA GPU with CUDA for GPU acceleration
  - Any CPU for CPU fallback

## Installation

### From PyPI (Recommended)

Choose the installation that matches your hardware:

#### macOS with Apple Silicon
```bash
pip install pymtcnn[coreml]
```

#### NVIDIA GPU (CUDA)
```bash
pip install pymtcnn[onnx-gpu]
```

#### CPU only
```bash
pip install pymtcnn[onnx]
```

#### All backends (development)
```bash
pip install pymtcnn[all]
```

### From Source

```bash
git clone https://github.com/johnwilsoniv/pymtcnn.git
cd pymtcnn
pip install -e .[coreml]  # or [onnx-gpu] or [onnx]
```

## Quick Start

### Auto-Backend Selection (Recommended)

PyMTCNN automatically selects the best available backend:

```python
import cv2
from pymtcnn import MTCNN

# Auto-select best backend (CoreML on Mac, CUDA on NVIDIA, CPU fallback)
detector = MTCNN(verbose=True)  # Shows which backend was selected

# Load image
img = cv2.imread("image.jpg")

# Detect faces
bboxes, landmarks = detector.detect(img)

# Process results
print(f"Detected {len(bboxes)} faces")
for i, bbox in enumerate(bboxes):
    x, y, w, h, conf = bbox
    print(f"Face {i+1}: ({x:.0f}, {y:.0f}) {w:.0f}×{h:.0f} (confidence: {conf:.3f})")
```

### Force Specific Backend

```python
from pymtcnn import MTCNN

# Force CoreML (Apple Neural Engine)
detector = MTCNN(backend='coreml')

# Force CUDA (NVIDIA GPU)
detector = MTCNN(backend='cuda')

# Force CPU
detector = MTCNN(backend='cpu')
```

### Batch Video Processing

```python
import cv2
from pymtcnn import MTCNN

# Initialize detector
detector = MTCNN()

# Load video frames
cap = cv2.VideoCapture("video.mp4")
frames = []
for _ in range(4):  # Process 4 frames at a time
    ret, frame = cap.read()
    if ret:
        frames.append(frame)

# Batch detection (cross-frame batching for maximum throughput)
results = detector.detect_batch(frames)

# Process results
for i, (bboxes, landmarks) in enumerate(results):
    print(f"Frame {i+1}: {len(bboxes)} faces detected")
```

### Advanced: Direct Backend Access

For advanced users who need backend-specific features:

```python
from pymtcnn import CoreMLMTCNN, ONNXMTCNN

# Use CoreML directly
coreml_detector = CoreMLMTCNN(verbose=True)

# Use ONNX directly with specific provider
onnx_detector = ONNXMTCNN(provider='cuda', verbose=True)
```

## API Reference

### `CoreMLMTCNN`

Main face detector class.

#### Constructor

```python
CoreMLMTCNN(
    min_face_size=60,
    thresholds=[0.6, 0.7, 0.7],
    factor=0.709,
    coreml_dir=None,
    verbose=False
)
```

**Parameters:**

- `min_face_size` (int): Minimum face size in pixels. Default: 60
- `thresholds` (list): Detection thresholds for [PNet, RNet, ONet]. Default: [0.6, 0.7, 0.7]
- `factor` (float): Image pyramid scale factor. Default: 0.709
- `coreml_dir` (str): Path to CoreML models directory. Default: bundled models
- `verbose` (bool): Enable verbose logging. Default: False

#### Methods

##### `detect(image)`

Detect faces in a single image using within-frame batching.

**Parameters:**
- `image` (numpy.ndarray): Input image (BGR format, H×W×3)

**Returns:**
- `bboxes` (numpy.ndarray): Bounding boxes (N×5), format: [x, y, w, h, confidence]
- `landmarks` (numpy.ndarray): Facial landmarks (N×5×2), 5 points per face: left eye, right eye, nose, left mouth, right mouth

**Performance:** 31.88 FPS (31.4 ms/frame)

##### `detect_batch(frames)`

Detect faces in multiple frames using cross-frame batching.

**Parameters:**
- `frames` (list): List of images (each BGR format, H×W×3)

**Returns:**
- `results` (list): List of (bboxes, landmarks) tuples, one per frame

**Performance:** 34.26 FPS (29.2 ms/frame) with batch_size=4

**Recommended batch size:** 4 frames for optimal throughput

## Performance Guide

### When to Use Each Method

- **`detect()`**: Use for real-time per-frame processing, webcam feeds, or when you need lowest latency
- **`detect_batch()`**: Use for offline batch video processing, maximum throughput, or when processing multiple frames simultaneously

### Optimization Tips

1. **Batch Size**: Use 4 frames for optimal throughput
   - Larger batches (8, 16) are slower due to overhead

2. **Frame Resolution**: Performance tested on 1920×1080
   - Lower resolution → faster processing
   - Higher resolution → more candidates, may require batch splitting

3. **Min Face Size**: Increase `min_face_size` for better performance
   - Default: 60 pixels
   - 80-100 pixels: 1.2-1.5x faster (may miss smaller faces)

## Examples

See the `examples/` directory for complete examples:

- `single_frame_detection.py`: Basic single-frame face detection
- `batch_processing.py`: Batch video processing
- `s1_integration_example.py`: Integration with S1 video pipeline

## Accuracy

PyMTCNN maintains high accuracy while achieving exceptional performance:

- **Mean IoU**: 95% vs C++ OpenFace baseline
- **Detection Agreement**: 100% (same faces detected)
- **Validation**: Tested on 30 frames from real-world patient videos

## Architecture

PyMTCNN uses a three-stage cascade architecture:

1. **PNet** (Proposal Network): Fast candidate generation using image pyramid
2. **RNet** (Refinement Network): Candidate refinement with batching
3. **ONet** (Output Network): Final bbox regression and landmark prediction

All networks are converted to CoreML FP32 format with flexible batch dimensions (1-50) for optimal ANE utilization.

## Optimization Journey

PyMTCNN achieved a **175.7x speedup** through multiple optimization phases:

| Phase | Implementation | FPS | Speedup | Status |
|-------|---------------|-----|---------|--------|
| Baseline | Pure Python CNN | 0.195 | 1.0x | ✅ |
| Phase 1 | Vectorized NumPy | 0.910 | 4.7x | ✅ |
| Phase 2 | ONNX Runtime CPU | 5.870 | 30.1x | ✅ |
| Phase 3 | CoreML + ANE | 13.56 | 69.5x | ✅ |
| Phase 4 | Within-Frame Batching | 31.88 | 163.5x | ✅ |
| Phase 5 | Cross-Frame Batching | **34.26** | **175.7x** | ✅ |

See `docs/OPTIMIZATION_JOURNEY.md` for the complete story.

## License

This project is licensed under the **Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)** license.

**You are free to:**
- Share: Copy and redistribute the material
- Adapt: Remix, transform, and build upon the material

**Under the following terms:**
- Attribution: You must give appropriate credit
- NonCommercial: You may not use the material for commercial purposes

See [LICENSE](LICENSE) for full terms.

## Citation

If you use PyMTCNN in your research, please cite:

```bibtex
@software{pymtcnn2025,
  title={PyMTCNN: High-Performance MTCNN Face Detection for Apple Silicon},
  author={SplitFace},
  year={2025},
  url={https://github.com/your-org/PyMTCNN}
}
```

## Acknowledgments

- Original MTCNN paper: Zhang et al., "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks"
- C++ OpenFace implementation: Tadas Baltrušaitis et al.
- Apple Neural Engine optimization insights from the CoreML community

## Support

For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/your-org/PyMTCNN).
