Metadata-Version: 2.4
Name: google-tunix
Version: 0.1.0.dev0
Summary: A lightweight JAX-native LLM post-training framework.
Author-email: Tunix Developers <tunix-dev@google.com>
License-Expression: Apache-2.0
Project-URL: Source, https://github.com/google/tunix
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: datasets
Requires-Dist: gcsfs
Requires-Dist: grain
Requires-Dist: huggingface_hub
Requires-Dist: jax
Requires-Dist: jax[tpu]
Requires-Dist: jaxtyping
Requires-Dist: kagglehub
Requires-Dist: omegaconf
Requires-Dist: qwix
Requires-Dist: sentencepiece
Requires-Dist: tensorboardX
Requires-Dist: tensorflow_datasets
Requires-Dist: tqdm
Requires-Dist: transformers
Requires-Dist: python-dotenv
Provides-Extra: docs
Requires-Dist: sphinx>=8.2.3; extra == "docs"
Requires-Dist: sphinx-book-theme>=1.1.4; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints; extra == "docs"
Requires-Dist: ipython>=8.8.0; extra == "docs"
Requires-Dist: myst-nb>=1.3.0; extra == "docs"
Requires-Dist: matplotlib>=3.10.0; extra == "docs"
Requires-Dist: sphinx-gallery>=0.19.0; extra == "docs"
Requires-Dist: sphinx-collections>=0.0.1; extra == "docs"
Requires-Dist: sphinx_contributors; extra == "docs"
Provides-Extra: prod
Requires-Dist: flax>=0.12.0; extra == "prod"
Provides-Extra: dev
Requires-Dist: flax>=0.11.2; extra == "dev"
Dynamic: license-file

# Tunix: A JAX-native LLM Post-Training Library

**Tunix(Tune-in-JAX)** is a JAX based library designed to streamline the
post-training of Large Language Models. It provides efficient and scalable
supports for:

- **Supervised Fine-Tuning**
- **Reinforcement Learning (RL)**
- **Knowledge Distillation**

Tunix leverages the power of JAX for accelerated computation and seamless
integration with JAX-based modeling framework
[Flax NNX](https://flax.readthedocs.io/en/latest/nnx_basics.html).

**Current Status: Early Development**

Tunix is in early development. We're actively working to expand its
capabilities, usability and improve its performance. Stay tuned for upcoming
updates and new features!

## Key Features & Highlights

Tunix is still under development, here's a glimpse of the current features:

- **Supervised Fine-Tuning:**
  - Full Weights Fine-Tuning
  - Parameter-Efficient Fine-Tuning (PEFT) with LoRA/Q-LoRA Layers
- **Reinforcement Learning (RL):**
  - Proximal Policy Optimization (PPO)
  - Group Relative Policy Optimization (GRPO)
  - Token-level Group Sequence Policy Optimization (GSPO-token)
- **Preference Fine-Tuning:**
  - Preference alignments with Direct Preference Optimization (DPO)
- **Knowledge Distillation:**
  - Logit Strategy: A classic approach where the student learns to match the
    teacher's output probability distribution.
  - Attention Transfer & Projection Strategies: Methods to align the attention
    mechanisms between the student and teacher models.
  - Feature Pooling & Projection Strategies: General techniques for matching
    intermediate feature representations, even between models of different
    architectures.
- **Modularity:**
  - Components are designed to be reusable and composable
  - Easy to customize and extend
- **Efficiency:**
  - Native support of common model sharding strategies such as DP, FSDP and TP
  - Designed for distributed training on accelerators (TPU)

## Upcoming

- **Agentic RL Training:**
  - Async Rollout
  - Multi-turn & multi-step support
  - Tool usage
- **Advanced Algorithms:**
  - Addtional state-of-the-art RL and distillation algorithms
- **Scalability:**
  - Multi-host distributed training
  - Optimized rollout with vLLM
- **User Guides:**
  - More advanced RL recipe

## Installation

You can install Tunix in several ways:

1. From PyPI (recommended):

```sh
pip install "tunix[prod]"
```

2. Directly from GitHub (latest main branch)

```sh
pip install git+https://github.com/google/tunix
```

3. From source (editable install) If you plan to modify the codebase and run it
   in development mode:

```sh
git clone https://github.com/google/tunix.git
cd tunix
pip install -e ".[dev]"

```

## Getting Started

To get started, we have a bunch of detailed examples and tutorials.

- [PEFT Gemma with QLoRA](https://github.com/google/tunix/blob/main/examples/qlora_demo.ipynb)
- [Training Gemma on grade school Math problems using GRPO](https://github.com/google/tunix/blob/main/examples/grpo_demo.ipynb)
- [Logit Distillation using Gemma models](https://github.com/google/tunix/blob/main/examples/logit_distillation.ipynb)

To setup Jupyter notebook on single host GCP TPU VM, please refer to the
[setup script](https://github.com/google/tunix/blob/main/scripts/setup_notebook_tpu_single_host.sh).

We plan to provide clear, concise documentation and more examples in the near
future.

## Contributing and Feedbacks

We welcome contributions! As Tunix is in early development, the contribution
process is still being formalized. A rough draft of the contribution process is
present [here](https://github.com/google/tunix/blob/main/CONTRIBUTING.md). In
the meantime, you can make feature requests, report issues and ask questions in
our
[Tunix GitHub discussion forum](https://github.com/google/tunix/discussions).

## Collaborations and Partnership

[GRL](https://github.com/lmgame-org/GRL/blob/tunix_integration_dev/README.md)
(Game Reinforcement Learning), developed by
[Hao AI Lab](https://hao-ai-lab.github.io/) from UCSD, is an open-source
framework for post-training large language models through multi-turn RL on
challenging games. In collaboration with Tunix, GRL integrates seamless TPU
support—letting users quickly run scalable, reproducible RL experiments (like
PPO rollouts on Qwen2.5-0.5B-Instruct) on TPU v4 meshes with
[minimal setup](https://github.com/lmgame-org/GRL/blob/tunix_integration_dev/README.md#5-launch-the-quick-test-defaults-to-qwen2505b-supports-4-tpu-v4-with-mesh-22).
This partnership empowers the community to push LLM capabilities further,
combining Tunix’s optimized TPU runtime with GRL’s flexible game RL pipeline for
cutting-edge research and easy reproducibility.

## Stay Tuned!

Thank you for your interest in Tunix. We're working hard to bring you a powerful
and efficient library for LLM post-training. Please follow our progress and
check back for updates!

## Acknowledgements

Thank you to all our wonderful contributors!

[![Contributors](https://contrib.rocks/image?repo=google/tunix)](https://github.com/google/tunix/graphs/contributors)
