# CleanFrames

CleanFrames is a Python library designed to clean and summarize video frames efficiently using embedding models and clustering techniques. It provides tools to process video frames, remove duplicates or near-duplicates, and generate concise reports and visualizations.

## Key Features

- Support for multiple embedding models to represent frames.
- Various clustering methods to group similar frames.
- Caching mechanisms to optimize performance.
- Visualization tools to inspect clusters and embeddings.
- Cleaning functions to remove redundant frames.
- Reporting capabilities to summarize cleaning results.
- Two main classes: `CleanFrame` for standard processing and `CleanFrame_optimized` for enhanced performance.

## Installation

To install CleanFrames, you can clone the repository and install the required dependencies. (Note: The exact installation commands depend on your setup and are not specified in the code.)

```bash
git clone <repository-url>
cd cleanframes
pip install -r requirements.txt
```

## Usage

### Using `CleanFrame`

```python
from cleanframes import CleanFrame

# Initialize with video path and parameters
cf = CleanFrame(
    video_path='path/to/video.mp4',
    embedding_model='clip-ViT-B-32',
    clustering_method='kmeans',
    cache_folder='cache/',
    verbose=True
)

# Load video frames
cf.load_frames()

# Generate embeddings for frames
cf.embed_frames()

# Cluster embeddings to group similar frames
cf.cluster_frames()

# Clean frames by removing duplicates or near-duplicates
cleaned_frames = cf.clean_frames()

# Generate report of cleaning
cf.report()

# Visualize clusters or embeddings
cf.visualize()
```

### Using `CleanFrame_optimized`

```python
from cleanframes import CleanFrame_optimized

# Initialize with video path and parameters
cf_opt = CleanFrame_optimized(
    video_path='path/to/video.mp4',
    embedding_model='clip-ViT-L-14',
    clustering_method='dbscan',
    cache_folder='cache_optimized/',
    verbose=True
)

# Load video frames with optimized method
cf_opt.load_frames()

# Generate embeddings using optimized pipeline
cf_opt.embed_frames()

# Cluster embeddings
cf_opt.cluster_frames()

# Clean frames
cleaned_frames_opt = cf_opt.clean_frames()

# Generate report
cf_opt.report()

# Visualize results
cf_opt.visualize()
```

## Supported Embedding Models

CleanFrames supports various embedding models to convert video frames into numerical representations, including but not limited to:

- CLIP models such as `clip-ViT-B-32` and `clip-ViT-L-14`
- Other models can be integrated as per user requirements.

## Clustering Methods

The library provides different clustering algorithms to group similar frames:

- KMeans clustering
- DBSCAN clustering
- Other clustering methods can be added or customized.

## Caching

To improve performance, CleanFrames supports caching of intermediate results such as extracted frames and computed embeddings. Users can specify a cache folder where these results are stored and reused.

## Visualization

CleanFrames includes visualization tools to help users inspect the clustering results and embedding distributions. This aids in understanding the cleaning process and verifying the quality of frame grouping.

## Cleaning and Reporting

The cleaning functions remove redundant frames based on clustering results and embedding similarity. After cleaning, a report is generated summarizing the number of frames processed, removed, and retained, providing insights into the cleaning effectiveness.

---

For more detailed information and advanced usage, please refer to the source code and examples provided in the repository.
