Metadata-Version: 2.4
Name: ai-sub
Version: 0.0.7
Summary: AI-Powered Subtitle Generation with Translation
Author: FlippFuzz
Project-URL: Homepage, https://github.com/FlippFuzz/ai-sub
Project-URL: Bug Tracker, https://github.com/FlippFuzz/ai-sub/issues
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pysubs2
Requires-Dist: google-genai
Requires-Dist: static-ffmpeg
Requires-Dist: pymediainfo
Requires-Dist: pydantic
Requires-Dist: retrying
Dynamic: license-file

# AI Sub: AI-Powered Subtitle Generation with Translation

[![PyPI version](https://img.shields.io/pypi/v/ai-sub)](https://pypi.org/project/ai-sub)
[![Downloads](https://img.shields.io/pypi/dw/ai-sub)](https://pypistats.org/packages/ai-sub)

---
## Project Overview
AI Sub is a powerful tool that leverages AI (currently Google Gemini) to produce English and Japanese subtitles for videos, translating between languages as necessary.
It is primarily tested and designed for Hololive concert/cover videos, but might work on other content.

---

## Showcase

Here's an example of subtitles generated by AI Sub:

[![Video Screenshot](https://github.com/FlippFuzz/ai-sub/raw/main/showcase/42h4ydJS3zk.png)](https://github.com/FlippFuzz/ai-sub/raw/main/showcase/42h4ydJS3zk.srt)

For more examples, please visit the [showcase directory](https://github.com/FlippFuzz/ai-sub/blob/main/showcase/README.md).

---

## Pros and cons of using Gemini as the AI model

### Pros:
*   **Multimodal Context:** Gemini's advanced multimodal capabilities enable it to analyze video content comprehensively, including on-screen text, for superior contextual understanding and more accurate subtitle generation.
*   **Cloud-Based Processing:** All processing is efficiently handled on Google Gemini's infrastructure, eliminating the need for local GPUs or extensive computational resources on your machine.

### Cons:
*   **Timestamp Precision:** Subtitle timestamps may exhibit a minor offset of a few seconds.
*   **Network Usage:** Uploading entire video files to Google's services will consume network bandwidth.

---

## How AI Sub Works
*   **Video Segmentation:** The input video is first segmented into 180-second segments. This duration is configurable via the `--split_seconds` flag.
*   **Concurrent Processing:** Each video segment is then sent to the AI model (Google Gemini) for subtitle generation. You can adjust the number of concurrent processing threads using the `--num_processing_threads` flag to optimize performance.
*   **Subtitle Compilation:** All generated subtitle parts are then combined into a single, final subtitle file.

---

## Getting Started: A Quick Guide

### 1. Obtain Your Google Gemini API Key
Follow these simple steps to acquire your API key:
1.  Sign in to [Google AI Studio](https://aistudio.google.com/app/apikey).
2.  Click "Create API Key."
3.  Copy and securely store your API key. **Never disclose your API key publicly.**

### 2. Set Up Your Python Environment (Python 3.10+ Required)
Prepare your python virtual environment:
```bash
python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate.bat`
pip install --upgrade ai-sub
```

### 3. Execute the Script
Run the application with your video file:
```bash
ai-sub --api_key=YOUR_API_KEY "path/to/your/video.mp4"
```
**Note**: Replace `YOUR_API_KEY` with your actual Google Gemini API key and `"path/to/your/video.mp4"` with the full path to your video file.

---

## Known Limitations

1.  **Timestamp Accuracy:** Subtitle timestamps may exhibit inaccuracies. This is an inherent characteristic of the Gemini AI model.
    *   Observations indicate that shorter video segments generally lead to improved timestamp accuracy.
    *   Requesting second-level precision for timestamps generally yields more accurate results compared to millisecond-level precision from the model. Consequently, the current implementation is designed to request second-level timestamps.

2.  **AI Hallucinations:** Like all AI models, Gemini may occasionally produce "hallucinations" or inaccurate information. This is a known characteristic of current AI technology.

If you encounter issues related to these limitations, consider re-processing specific video segments as detailed in the "Re-processing Specific Video Segments" section below.

---

## Re-processing Specific Video Segments
Intermediate files generated during processing are stored in the temporary directory, which defaults to `tmp_<input_file_name>` but can be specified using the `--temp_dir` CLI flag. 
Users can examine these `part_XXX.json` files within this directory to review the AI's results for individual segments. 
To re-process a specific video segment, simply delete its corresponding `part_XXX.json` file. 
Upon subsequent execution, the script will automatically re-process only those segments for which the `part_XXX.json` file is absent.

