Metadata-Version: 2.4
Name: cleanframes
Version: 0.3.9
Summary: A professional tool for cleaning duplicate or near-duplicate image frames using perceptual hashing and embeddings.
Home-page: https://github.com/abdullahalmutairi/cleanframes
Author: Abdullah Almutairi
Author-email: abdullah@example.com
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0.1
Requires-Dist: torchvision>=0.15.2
Requires-Dist: torchaudio>=2.0.2
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: tqdm
Requires-Dist: pillow
Requires-Dist: timm>=0.9.12
Requires-Dist: transformers>=4.41.0
Requires-Dist: open_clip_torch>=2.23.0
Requires-Dist: scikit-learn
Requires-Dist: matplotlib
Requires-Dist: tabulate
Requires-Dist: imagehash
Requires-Dist: onnxruntime-gpu>=1.17.0
Requires-Dist: hdbscan
Requires-Dist: rich
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# CleanFrames

CleanFrames is a Python library designed to clean and summarize image frames stored in folders efficiently using embedding models and clustering techniques. It processes folders of frames, removes duplicates or near-duplicates, caches embeddings and reports for faster subsequent runs, and saves cleaned/removed images alongside the original dataset.

## Key Features

- Processes folders of image frames instead of videos.
- Supports multiple embedding models to represent frames.
- Various clustering methods to group similar frames.
- Caches embeddings and reports to optimize performance.
- Saves cleaned and removed images beside the original dataset.
- Visualization tools to inspect clusters and frame pairs.
- Generates text-only console reports summarizing cleaning results.

## Installation

To install CleanFrames, clone the repository and install the required dependencies:

```bash
git clone <repository-url>
cd cleanframes
pip install -r requirements.txt
```

## Usage

### Basic Example

```python
from cleanframes import CleanFrame

# Initialize with folder path, embedding model, clustering method, and caching enabled
cf = CleanFrame(
    path='path/to/frames_folder',
    model='clip-ViT-B-32',
    cluster='kmeans',
    cache=True,
    verbose=True
)

# Run the full cleaning pipeline: embedding, clustering, cleaning
cf.run()

# Generate a text-only console report of the cleaning results
cf.report()

# Visualize clusters of frames
cf.visualize_clusters()
```

### Optimized Workflow Example

```python
from cleanframes import CleanFrame

# Initialize with different model and clustering method
cf = CleanFrame(
    path='path/to/frames_folder',
    model='clip-ViT-L-14',
    cluster='dbscan',
    cache=True,
    verbose=True
)

# Run the cleaning process
cf.run()

# Print cleaning report
cf.report()

# Visualize clusters and frame pairs
cf.visualize_clusters()
```

## Caching and Outputs

- Embeddings and cleaning reports are cached within the specified cache folder for faster reruns.
- Cleaned and removed images are saved beside the original frames in the dataset folder, allowing easy inspection and further use.
- The caching mechanism avoids redundant computations, improving efficiency when processing large datasets.

## Supported Embedding Models

CleanFrames supports multiple embedding models for frame representation, including but not limited to:

- CLIP models such as `clip-ViT-B-32` and `clip-ViT-L-14`
- Additional models can be integrated as needed.

## Clustering Methods

Available clustering algorithms include:

- KMeans clustering
- DBSCAN clustering
- Other clustering methods can be added or customized.

## Visualization

CleanFrames provides visualization tools to help users inspect the clustering results and pairs of similar frames. This helps verify the cleaning quality and understand the grouping of frames.

## Reporting

After cleaning, CleanFrames generates a concise text-only console report summarizing:

- Number of frames processed
- Number of frames removed
- Number of frames retained

This report provides insights into the effectiveness of the cleaning process.

---

For more detailed information and advanced usage, please refer to the source code and examples provided in the repository.
