Metadata-Version: 2.4
Name: cleanframes
Version: 0.3.3
Summary: A professional tool for cleaning duplicate or near-duplicate image frames using perceptual hashing and embeddings.
Home-page: https://github.com/abdullahalmutairi/cleanframes
Author: Abdullah Almutairi
Author-email: abdullah@example.com
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0.1
Requires-Dist: torchvision>=0.15.2
Requires-Dist: torchaudio>=2.0.2
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: tqdm
Requires-Dist: pillow
Requires-Dist: timm>=0.9.12
Requires-Dist: transformers>=4.41.0
Requires-Dist: open_clip_torch>=2.23.0
Requires-Dist: scikit-learn
Requires-Dist: matplotlib
Requires-Dist: tabulate
Requires-Dist: imagehash
Requires-Dist: onnxruntime-gpu>=1.17.0
Requires-Dist: hdbscan
Requires-Dist: rich
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# CleanFrames

CleanFrames is a Python library designed to clean and summarize video frames efficiently using embedding models and clustering techniques. It provides tools to process video frames, remove duplicates or near-duplicates, and generate concise reports and visualizations.

## Key Features

- Support for multiple embedding models to represent frames.
- Various clustering methods to group similar frames.
- Caching mechanisms to optimize performance.
- Visualization tools to inspect clusters and embeddings.
- Cleaning functions to remove redundant frames.
- Reporting capabilities to summarize cleaning results.
- Two main classes: `CleanFrame` for standard processing and `CleanFrame_optimized` for enhanced performance.

## Installation

To install CleanFrames, you can clone the repository and install the required dependencies. (Note: The exact installation commands depend on your setup and are not specified in the code.)

```bash
git clone <repository-url>
cd cleanframes
pip install -r requirements.txt
```

## Usage

### Using `CleanFrame`

```python
from cleanframes import CleanFrame

# Initialize with video path and parameters
cf = CleanFrame(
    video_path='path/to/video.mp4',
    embedding_model='clip-ViT-B-32',
    clustering_method='kmeans',
    cache_folder='cache/',
    verbose=True
)

# Load video frames
cf.load_frames()

# Generate embeddings for frames
cf.embed_frames()

# Cluster embeddings to group similar frames
cf.cluster_frames()

# Clean frames by removing duplicates or near-duplicates
cleaned_frames = cf.clean_frames()

# Generate report of cleaning
cf.report()

# Visualize clusters or embeddings
cf.visualize()
```

### Using `CleanFrame_optimized`

```python
from cleanframes import CleanFrame_optimized

# Initialize with video path and parameters
cf_opt = CleanFrame_optimized(
    video_path='path/to/video.mp4',
    embedding_model='clip-ViT-L-14',
    clustering_method='dbscan',
    cache_folder='cache_optimized/',
    verbose=True
)

# Load video frames with optimized method
cf_opt.load_frames()

# Generate embeddings using optimized pipeline
cf_opt.embed_frames()

# Cluster embeddings
cf_opt.cluster_frames()

# Clean frames
cleaned_frames_opt = cf_opt.clean_frames()

# Generate report
cf_opt.report()

# Visualize results
cf_opt.visualize()
```

## Supported Embedding Models

CleanFrames supports various embedding models to convert video frames into numerical representations, including but not limited to:

- CLIP models such as `clip-ViT-B-32` and `clip-ViT-L-14`
- Other models can be integrated as per user requirements.

## Clustering Methods

The library provides different clustering algorithms to group similar frames:

- KMeans clustering
- DBSCAN clustering
- Other clustering methods can be added or customized.

## Caching

To improve performance, CleanFrames supports caching of intermediate results such as extracted frames and computed embeddings. Users can specify a cache folder where these results are stored and reused.

## Visualization

CleanFrames includes visualization tools to help users inspect the clustering results and embedding distributions. This aids in understanding the cleaning process and verifying the quality of frame grouping.

## Cleaning and Reporting

The cleaning functions remove redundant frames based on clustering results and embedding similarity. After cleaning, a report is generated summarizing the number of frames processed, removed, and retained, providing insights into the cleaning effectiveness.

---

For more detailed information and advanced usage, please refer to the source code and examples provided in the repository.
