Metadata-Version: 2.4
Name: optimum-onnx
Version: 0.0.1
Summary: Optimum ONNX is an interface between the Hugging Face libraries and ONNX / ONNX Runtime
Author-email: "HuggingFace Inc. Special Ops Team" <hardware@huggingface.co>
License: Apache-2.0
Project-URL: Homepage, https://github.com/huggingface/optimum-onnx
Keywords: transformers,quantization,inference,onnx,onnxruntime
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9.0
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: optimum~=2.0.0
Requires-Dist: transformers<4.56.0,>=4.36.0
Requires-Dist: onnx
Provides-Extra: onnxruntime
Requires-Dist: onnxruntime>=1.18.0; extra == "onnxruntime"
Provides-Extra: onnxruntime-gpu
Requires-Dist: onnxruntime-gpu>=1.18.0; extra == "onnxruntime-gpu"
Provides-Extra: tests
Requires-Dist: accelerate>=0.26.0; extra == "tests"
Requires-Dist: datasets; extra == "tests"
Requires-Dist: einops; extra == "tests"
Requires-Dist: hf_xet; extra == "tests"
Requires-Dist: parameterized; extra == "tests"
Requires-Dist: Pillow; extra == "tests"
Requires-Dist: pytest-xdist; extra == "tests"
Requires-Dist: pytest; extra == "tests"
Requires-Dist: safetensors; extra == "tests"
Requires-Dist: scipy; extra == "tests"
Requires-Dist: sentencepiece; extra == "tests"
Requires-Dist: timm; extra == "tests"
Requires-Dist: onnxslim>=0.1.60; extra == "tests"
Requires-Dist: rjieba; extra == "tests"
Requires-Dist: sacremoses; extra == "tests"
Provides-Extra: quality
Requires-Dist: ruff==0.12.3; extra == "quality"
Dynamic: license-file

<div align="center">

# 🤗 Optimum ONNX

**Export your Hugging Face models to ONNX**

[Documentation](https://huggingface.co/docs/optimum/index) | [ONNX](https://onnx.ai/) | [Hub](https://huggingface.co/onnx)

</div>


### Installation

Before you begin, make sure you install all necessary libraries by running:

```bash
pip install "optimum-onnx[onnxruntime] @ git+https://github.com/huggingface/optimum-onnx.git"
```

If you want to use the [GPU version of ONNX Runtime](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#cuda-execution-provider), make sure the CUDA and cuDNN [requirements](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements) are satisfied, and install the additional dependencies by running :

```bash
pip install "optimum-onnx[onnxruntime-gpu] @ git+https://github.com/huggingface/optimum-onnx.git"
```

To avoid conflicts between `onnxruntime` and `onnxruntime-gpu`, make sure the package `onnxruntime` is not installed by running `pip uninstall onnxruntime` prior to installing Optimum.

### ONNX export

It is possible to export 🤗 Transformers, Diffusers, Timm and Sentence Transformers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily:

```bash
optimum-cli export onnx --model meta-llama/Llama-3.2-1B onnx_llama/
```
The model can also be optimized and quantized with `onnxruntime`.

For more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).

#### Inference

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seamless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend:


```diff

  from transformers import AutoTokenizer, pipeline
- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM

- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
  result = pipe("He never went out without a book under his arm")
```

More details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).
