Metadata-Version: 2.4
Name: google-tunix
Version: 0.1.4
Summary: A lightweight JAX-native LLM post-training framework.
Author-email: Tunix Developers <tunix-dev@google.com>
License: Apache-2.0
Project-URL: Source, https://github.com/google/tunix
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: datasets
Requires-Dist: flax>=0.11.1
Requires-Dist: fsspec
Requires-Dist: google-metrax>=0.2.3
Requires-Dist: grain
Requires-Dist: huggingface_hub
Requires-Dist: jaxtyping
Requires-Dist: jinja2
Requires-Dist: kagglehub
Requires-Dist: numba
Requires-Dist: omegaconf
Requires-Dist: pylatexenc
Requires-Dist: python-dotenv
Requires-Dist: qwix
Requires-Dist: sentencepiece
Requires-Dist: sympy
Requires-Dist: tensorflow_datasets
Requires-Dist: tqdm
Requires-Dist: transformers
Requires-Dist: hf_transfer
Provides-Extra: docs
Requires-Dist: sphinx>=8.2.3; extra == "docs"
Requires-Dist: sphinx-book-theme>=1.1.4; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints; extra == "docs"
Requires-Dist: ipython>=8.8.0; extra == "docs"
Requires-Dist: myst-nb>=1.3.0; extra == "docs"
Requires-Dist: matplotlib>=3.10.0; extra == "docs"
Requires-Dist: sphinx-gallery>=0.19.0; extra == "docs"
Requires-Dist: sphinx-collections>=0.0.1; extra == "docs"
Requires-Dist: sphinx_contributors; extra == "docs"
Provides-Extra: prod
Requires-Dist: jax[tpu]!=0.7.2,>=0.6.0; extra == "prod"
Provides-Extra: dev
Dynamic: license-file

# Tunix: A JAX-native LLM Post-Training Library

<div align="left">

<a href="https://tunix.readthedocs.io/en/latest/index.html"><img src="https://img.shields.io/badge/documentation-blue"></a>

</div>

**Tunix(Tune-in-JAX)** is a JAX based library designed to streamline the
post-training of Large Language Models. It provides efficient and scalable
supports for:

- **Supervised Fine-Tuning**
- **Reinforcement Learning (RL)**
- **Knowledge Distillation**

Tunix leverages the power of JAX for accelerated computation and seamless
integration with JAX-based modeling framework
[Flax NNX](https://flax.readthedocs.io/en/latest/nnx_basics.html).

**Current Status: Early Development**

Tunix is in early development. We're actively working to expand its
capabilities, usability and improve its performance. Stay tuned for upcoming
updates and new features!

## Key Features & Highlights

Tunix is still under development, here's a glimpse of the current features:

- **Supervised Fine-Tuning:**
  - Full Weights Fine-Tuning
  - Parameter-Efficient Fine-Tuning (PEFT) with LoRA/Q-LoRA Layers
- **Reinforcement Learning (RL):**
  - Proximal Policy Optimization (PPO)
  - Group Relative Policy Optimization (GRPO)
  - Token-level Group Sequence Policy Optimization (GSPO-token)
- **Preference Fine-Tuning:**
  - Preference alignments with Direct Preference Optimization (DPO)
- **Knowledge Distillation:**
  - Logit Strategy: A classic approach where the student learns to match the
    teacher's output probability distribution.
  - Attention Transfer & Projection Strategies: Methods to align the attention
    mechanisms between the student and teacher models.
  - Feature Pooling & Projection Strategies: General techniques for matching
    intermediate feature representations, even between models of different
    architectures.
- **Modularity:**
  - Components are designed to be reusable and composable
  - Easy to customize and extend
- **Efficiency:**
  - Native support of common model sharding strategies such as DP, FSDP and TP
  - Designed for distributed training on accelerators (TPU)

## Upcoming

- **Agentic RL Training:**
  - Async Rollout
  - Multi-turn & multi-step support
  - Tool usage
- **Advanced Algorithms:**
  - Addtional state-of-the-art RL and distillation algorithms
- **Scalability:**
  - Multi-host distributed training
  - Optimized rollout with vLLM or SGLang-Jax
- **User Guides:**
  - More advanced RL recipe

## Installation

- You can install Tunix in several ways:

1. From PyPI (recommended):

```sh
pip install "google-tunix[prod]"
```

2. Directly from GitHub (latest main branch)

```sh
pip install git+https://github.com/google/tunix
```

3. From source (editable install) If you plan to modify the codebase and run it
   in development mode. If you'd like to install vllm, the tpu-inference
   supported version is not released yet, please follow the instructions to
   install manually
   (https://docs.vllm.ai/projects/tpu/en/latest/getting_started/installation/)
   or download the docker image (vllm/vllm-tpu:v0.11.1) then
   `pip install tpu-inference` for TPU backend:

```sh
git clone https://github.com/google/tunix.git
cd tunix
pip install -e ".[dev]"

# Then install vLLM and tpu-inference
```

- Using tunix with SGLang-Jax rollout

1. Install tunix using above ways
1. Then install SGLang-Jax

```
git clone git@github.com:sgl-project/sglang-jax.git
cd sglang-jax/python
pip install -e .
```

## Getting Started

To get started, we have a bunch of detailed examples and tutorials.

- [PEFT Gemma with QLoRA](https://github.com/google/tunix/blob/main/examples/qlora_gemma.ipynb)
- [Training Gemma on grade school Math problems using GRPO](https://github.com/google/tunix/blob/main/examples/grpo_gemma.ipynb)
- [Logit Distillation using Gemma models](https://github.com/google/tunix/blob/main/examples/logit_distillation.ipynb)
- [Training Llama3 or Qwen2 using GRPO and SGLang-Jax rollout](https://github.com/google/tunix/blob/main/scripts/grpo_demo_sglang_jax_rollout.py)

To setup Jupyter notebook on single host GCP TPU VM, please refer to the
[setup script](https://github.com/google/tunix/blob/main/scripts/setup_notebook_tpu_single_host.sh).

We plan to provide clear, concise documentation and more examples in the near
future.

## Contributing and Feedbacks

We welcome contributions! As Tunix is in early development, the contribution
process is still being formalized. A rough draft of the contribution process is
present [here](https://github.com/google/tunix/blob/main/CONTRIBUTING.md). In
the meantime, you can make feature requests, report issues and ask questions in
our
[Tunix GitHub discussion forum](https://github.com/google/tunix/discussions).

## Collaborations and Partnership

[GRL](https://github.com/lmgame-org/GRL/blob/tunix_integration_dev/README.md)
(Game Reinforcement Learning), developed by
[Hao AI Lab](https://hao-ai-lab.github.io/) from UCSD, is an open-source
framework for post-training large language models through multi-turn RL on
challenging games. In collaboration with Tunix, GRL integrates seamless TPU
support—letting users quickly run scalable, reproducible RL experiments (like
PPO rollouts on Qwen2.5-0.5B-Instruct) on TPU v4 meshes with
[minimal setup](https://github.com/lmgame-org/GRL/blob/tunix_integration_dev/README.md#5-launch-the-quick-test-defaults-to-qwen2505b-supports-4-tpu-v4-with-mesh-22).
This partnership empowers the community to push LLM capabilities further,
combining Tunix’s optimized TPU runtime with GRL’s flexible game RL pipeline for
cutting-edge research and easy reproducibility.

## Stay Tuned!

Thank you for your interest in Tunix. We're working hard to bring you a powerful
and efficient library for LLM post-training. Please follow our progress and
check back for updates!

## Citing Tunix

```bibtex
@misc{tunix2025,
  title={Tunix},
  author={Bao, Tianshu and Wang, Lance and Sharma, Abheesht and Shin, Jiwon and
  Yan, Ann and Tan, Sizhi and Gao, Haoyu and Ha, Jen and Chai, Lin and
  Liu, Dangyi and Iyer, Rakesh and Sahu, Mridul and others},
  year={2025},
  howpublished={\url{https://github.com/google/tunix}},
}
```

## Acknowledgements

Thank you to all our wonderful contributors!

[![Contributors](https://contrib.rocks/image?repo=google/tunix)](https://github.com/google/tunix/graphs/contributors)
