Metadata-Version: 2.1
Name: llama-index
Version: 0.9.25a1
Summary: Interface between LLMs and your data
Home-page: https://llamaindex.ai
License: MIT
Keywords: LLM,NLP,RAG,data,devtools,index,retrieval
Author: Jerry Liu
Author-email: jerry@llamaindex.ai
Maintainer: Andrei Fajardo
Maintainer-email: andrei@runllama.ai
Requires-Python: >=3.8.1,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Provides-Extra: gradientai
Provides-Extra: langchain
Provides-Extra: local-models
Provides-Extra: postgres
Provides-Extra: query-tools
Requires-Dist: SQLAlchemy[asyncio] (>=1.4.49)
Requires-Dist: aiohttp (>=3.8.6,<4.0.0)
Requires-Dist: asyncpg (>=0.28.0,<0.29.0) ; extra == "postgres"
Requires-Dist: beautifulsoup4 (>=4.12.2,<5.0.0)
Requires-Dist: dataclasses-json
Requires-Dist: deprecated (>=1.2.9.3)
Requires-Dist: fsspec (>=2023.5.0)
Requires-Dist: gradientai (>=1.4.0) ; extra == "gradientai"
Requires-Dist: guidance (>=0.0.64,<0.0.65) ; extra == "query-tools"
Requires-Dist: httpx
Requires-Dist: jsonpath-ng (>=1.6.0,<2.0.0) ; extra == "query-tools"
Requires-Dist: langchain (>=0.0.303) ; extra == "langchain"
Requires-Dist: llamaindex-py-client (>=0.1.6)
Requires-Dist: lm-format-enforcer (>=0.4.3,<0.5.0) ; extra == "query-tools"
Requires-Dist: nest-asyncio (>=1.5.8,<2.0.0)
Requires-Dist: nltk (>=3.8.1,<4.0.0)
Requires-Dist: numpy
Requires-Dist: openai (>=1.1.0)
Requires-Dist: optimum[onnxruntime] (>=1.13.2,<2.0.0) ; extra == "local-models"
Requires-Dist: pandas
Requires-Dist: pgvector (>=0.1.0,<0.2.0) ; extra == "postgres"
Requires-Dist: psycopg-binary (>=3.1.12,<4.0.0) ; extra == "postgres"
Requires-Dist: rank-bm25 (>=0.2.2,<0.3.0) ; extra == "query-tools"
Requires-Dist: requests (>=2.31.0)
Requires-Dist: scikit-learn ; extra == "query-tools"
Requires-Dist: sentencepiece (>=0.1.99,<0.2.0) ; extra == "local-models"
Requires-Dist: spacy (>=3.7.1,<4.0.0) ; extra == "query-tools"
Requires-Dist: tenacity (>=8.2.0,<9.0.0)
Requires-Dist: tiktoken (>=0.3.3)
Requires-Dist: transformers[torch] (>=4.34.0,<5.0.0) ; extra == "local-models"
Requires-Dist: typing-extensions (>=4.5.0)
Requires-Dist: typing-inspect (>=0.8.0)
Requires-Dist: unique-names-generator (>=1.0.2,<2.0.0)
Project-URL: Documentation, https://docs.llamaindex.ai/en/stable/
Project-URL: Repository, https://github.com/run-llama/llama_index
Description-Content-Type: text/markdown

# 🗂️ LlamaIndex 🦙

[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-index)](https://pypi.org/project/llama-index/)
[![GitHub contributors](https://img.shields.io/github/contributors/jerryjliu/llama_index)](https://github.com/jerryjliu/llama_index/graphs/contributors)
[![Discord](https://img.shields.io/discord/1059199217496772688)](https://discord.gg/dGcwcsnxhU)

LlamaIndex (GPT Index) is a data framework for your LLM application.

PyPI:

- LlamaIndex: https://pypi.org/project/llama-index/.
- GPT Index (duplicate): https://pypi.org/project/gpt-index/.

LlamaIndex.TS (Typescript/Javascript): https://github.com/run-llama/LlamaIndexTS.

Documentation: https://docs.llamaindex.ai/en/stable/.

Twitter: https://twitter.com/llama_index.

Discord: https://discord.gg/dGcwcsnxhU.

### Ecosystem

- LlamaHub (community library of data loaders): https://llamahub.ai
- LlamaLab (cutting-edge AGI projects using LlamaIndex): https://github.com/run-llama/llama-lab

## 🚀 Overview

**NOTE**: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates!

### Context

- LLMs are a phenomenal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.
- How do we best augment LLMs with our own private data?

We need a comprehensive toolkit to help perform this data augmentation for LLMs.

### Proposed Solution

That's where **LlamaIndex** comes in. LlamaIndex is a "data framework" to help you build LLM apps. It provides the following tools:

- Offers **data connectors** to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.)
- Provides ways to **structure your data** (indices, graphs) so that this data can be easily used with LLMs.
- Provides an **advanced retrieval/query interface over your data**: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
- Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else).

LlamaIndex provides tools for both beginner users and advanced users. Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in
5 lines of code. Our lower-level APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers, query engines, reranking modules),
to fit their needs.

## 💡 Contributing

Interested in contributing? See our [Contribution Guide](CONTRIBUTING.md) for more details.

## 📄 Documentation

Full documentation can be found here: https://docs.llamaindex.ai/en/latest/.

Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!

## 💻 Example Usage

```
pip install llama-index
```

Examples are in the `examples` folder. Indices are in the `indices` folder (see list of indices below).

To build a simple vector store index using OpenAI:

```python
import os

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("YOUR_DATA_DIRECTORY").load_data()
index = VectorStoreIndex.from_documents(documents)
```

To build a simple vector store index using non-OpenAI LLMs, e.g. Llama 2 hosted on [Replicate](https://replicate.com/), where you can easily create a free trial API token:

```python
import os

os.environ["REPLICATE_API_TOKEN"] = "YOUR_REPLICATE_API_TOKEN"

from llama_index.llms import Replicate

llama2_7b_chat = "meta/llama-2-7b-chat:8e6975e5ed6174911a6ff3d60540dfd4844201974602551e10e9e87ab143d81e"
llm = Replicate(
    model=llama2_7b_chat,
    temperature=0.01,
    additional_kwargs={"top_p": 1, "max_new_tokens": 300},
)

# set tokenizer to match LLM
from llama_index import set_global_tokenizer
from transformers import AutoTokenizer

set_global_tokenizer(
    AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf").encode
)

from llama_index.embeddings import HuggingFaceEmbedding
from llama_index import ServiceContext

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
service_context = ServiceContext.from_defaults(
    llm=llm, embed_model=embed_model
)

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("YOUR_DATA_DIRECTORY").load_data()
index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)
```

To query:

```python
query_engine = index.as_query_engine()
query_engine.query("YOUR_QUESTION")
```

By default, data is stored in-memory.
To persist to disk (under `./storage`):

```python
index.storage_context.persist()
```

To reload from disk:

```python
from llama_index import StorageContext, load_index_from_storage

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage")
# load index
index = load_index_from_storage(storage_context)
```

## 🔧 Dependencies

The main third-party package requirements are `tiktoken`, `openai`, and `langchain`.

All requirements should be contained within the `setup.py` file.
To run the package locally without building the wheel, simply run:

```bash
pip install poetry
poetry install --with dev
```

## 📖 Citation

Reference to cite if you use LlamaIndex in a paper:

```
@software{Liu_LlamaIndex_2022,
author = {Liu, Jerry},
doi = {10.5281/zenodo.1234},
month = {11},
title = {{LlamaIndex}},
url = {https://github.com/jerryjliu/llama_index},
year = {2022}
}
```

