Metadata-Version: 2.1
Name: pandasai
Version: 1.3b1
Summary: PandasAI is a Python library that integrates generative artificial intelligence capabilities into Pandas, making dataframes conversational.
License: MIT
Author: Gabriele Venturi
Requires-Python: >=3.9, !=2.7.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*, !=3.7.*, !=3.8.*
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Provides-Extra: connectors
Provides-Extra: excel
Provides-Extra: ggplot
Provides-Extra: google-ai
Provides-Extra: google-sheets
Provides-Extra: langchain
Provides-Extra: numpy
Provides-Extra: plotly
Provides-Extra: polars
Provides-Extra: scikit-learn
Provides-Extra: seaborn
Provides-Extra: statsmodels
Provides-Extra: streamlit
Provides-Extra: text-generation
Provides-Extra: yfinance
Requires-Dist: astor (>=0.8.1,<0.9.0)
Requires-Dist: beautifulsoup4 (>=4.12.2,<5.0.0) ; extra == "google-sheets"
Requires-Dist: duckdb (>=0.8.1,<0.9.0)
Requires-Dist: ggplot (>=0.11.5,<0.12.0) ; extra == "ggplot"
Requires-Dist: google-cloud-aiplatform (>=1.26.1,<2.0.0) ; extra == "google-ai"
Requires-Dist: google-generativeai (>=0.1.0rc2,<0.2.0) ; extra == "google-ai"
Requires-Dist: ipython (>=8.13.1,<9.0.0)
Requires-Dist: kaleido (==0.2.0) ; extra == "plotly"
Requires-Dist: langchain (>=0.0.199,<0.0.200) ; extra == "langchain"
Requires-Dist: matplotlib (>=3.7.1,<4.0.0)
Requires-Dist: numpy (>=1.17,<2.0) ; extra == "numpy"
Requires-Dist: openai (>=0.27.5,<0.28.0)
Requires-Dist: openpyxl (>=3.0.7,<4.0.0) ; extra == "excel"
Requires-Dist: pandas (==1.5.3)
Requires-Dist: plotly (>=5.15.0,<6.0.0) ; extra == "plotly"
Requires-Dist: polars (>=0.18.15,<0.19.0) ; extra == "polars"
Requires-Dist: psycopg2 (>=2.9.7,<3.0.0) ; extra == "connectors"
Requires-Dist: pydantic (>=1,<2)
Requires-Dist: pymysql (>=1.1.0,<2.0.0) ; extra == "connectors"
Requires-Dist: python-dotenv (>=1.0.0,<2.0.0)
Requires-Dist: scikit-learn (>=1.2.2,<2.0.0) ; extra == "scikit-learn"
Requires-Dist: scipy (>=1.9.0,<2.0.0)
Requires-Dist: seaborn (>=0.12.2,<0.13.0) ; extra == "seaborn"
Requires-Dist: snowflake-sqlalchemy (>=1.5.0,<2.0.0) ; extra == "connectors"
Requires-Dist: sqlalchemy (>=1.4.49,<2.0.0)
Requires-Dist: sqlalchemy-databricks (>=0.2.0,<0.3.0) ; extra == "connectors"
Requires-Dist: statsmodels (>=0.14.0,<0.15.0) ; extra == "statsmodels"
Requires-Dist: streamlit (>=1.23.1,<2.0.0) ; extra == "streamlit"
Requires-Dist: text-generation (>=0.6.0) ; extra == "text-generation"
Requires-Dist: yfinance (>=0.2.28,<0.3.0) ; extra == "yfinance"
Description-Content-Type: text/markdown

# PandasAI 🐼

[![Release](https://img.shields.io/pypi/v/pandasai?label=Release&style=flat-square)](https://pypi.org/project/pandasai/)
[![CI](https://github.com/gventuri/pandas-ai/actions/workflows/ci.yml/badge.svg)](https://github.com/gventuri/pandas-ai/actions/workflows/ci.yml/badge.svg)
[![CD](https://github.com/gventuri/pandas-ai/actions/workflows/cd.yml/badge.svg)](https://github.com/gventuri/pandas-ai/actions/workflows/cd.yml/badge.svg)
[![Coverage](https://codecov.io/gh/gventuri/pandas-ai/branch/main/graph/badge.svg)](https://codecov.io/gh/gventuri/pandas-ai)
[![Documentation Status](https://readthedocs.org/projects/pandas-ai/badge/?version=latest)](https://pandas-ai.readthedocs.io/en/latest/?badge=latest)
[![Discord](https://dcbadge.vercel.app/api/server/kF7FqH2FwS?style=flat&compact=true)](https://discord.gg/kF7FqH2FwS)
[![Downloads](https://static.pepy.tech/badge/pandasai)](https://pepy.tech/project/pandasai) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Open in Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1ZnO-njhL7TBOYPZaqvMvGtsjckZKrv2E?usp=sharing)

PandasAI is a Python library that adds Generative AI capabilities to [pandas](https://github.com/pandas-dev/pandas), the popular data analysis and manipulation tool. It is designed to be used in conjunction with pandas, and is not a replacement for it.

<!-- Add images/pandas-ai.png -->

![PandasAI](images/pandas-ai.png?raw=true)

## 🔧 Quick install

```bash
pip install pandasai
```

## 🔍 Demo

Try out PandasAI in your browser:

[![Open in Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1ZnO-njhL7TBOYPZaqvMvGtsjckZKrv2E?usp=sharing)

## 📖 Documentation

The documentation for PandasAI can be found [here](https://pandas-ai.readthedocs.io/en/latest/).

## 💻 Usage

> Disclaimer: GDP data was collected from [this source](https://ourworldindata.org/grapher/gross-domestic-product?tab=table), published by World Development Indicators - World Bank (2022.05.26) and collected at National accounts data - World Bank / OECD. It relates to the year of 2020. Happiness indexes were extracted from [the World Happiness Report](https://ftnnews.com/images/stories/documents/2020/WHR20.pdf). Another useful [link](https://data.world/makeovermonday/2020w19-world-happiness-report-2020).

PandasAI is designed to be used in conjunction with pandas. It makes pandas conversational, allowing you to ask questions to your data in natural language.

### Queries

For example, you can ask PandasAI to find all the rows in a DataFrame where the value of a column is greater than 5, and it will return a DataFrame containing only those rows:

```python
import pandas as pd
from pandasai import SmartDataframe

# Sample DataFrame
df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})

# Instantiate a LLM
from pandasai.llm import OpenAI
llm = OpenAI(api_token="YOUR_API_TOKEN")

df = SmartDataframe(df, config={"llm": llm})
df.chat('Which are the 5 happiest countries?')
```

The above code will return the following:

```
6            Canada
7         Australia
1    United Kingdom
3           Germany
0     United States
Name: country, dtype: object
```

Of course, you can also ask PandasAI to perform more complex queries. For example, you can ask PandasAI to find the sum of the GDPs of the 2 unhappiest countries:

```python
df.chat('What is the sum of the GDPs of the 2 unhappiest countries?')
```

The above code will return the following:

```
19012600725504
```

### Charts

You can also ask PandasAI to draw a graph:

```python
df.chat(
    "Plot the histogram of countries showing for each the gdp, using different colors for each bar",
)
```

![Chart](images/histogram-chart.png?raw=true)

You can save any charts generated by PandasAI by setting the `save_charts` parameter to `True` in the `PandasAI` constructor. For example, `PandasAI(llm, save_charts=True)`. Charts are saved in `./pandasai/exports/charts` .

### Multiple DataFrames

Additionally, you can also pass in multiple dataframes to PandasAI and ask questions relating them.

```python
import pandas as pd
from pandasai import SmartDatalake
from pandasai.llm import OpenAI

employees_data = {
    'EmployeeID': [1, 2, 3, 4, 5],
    'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
    'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}

salaries_data = {
    'EmployeeID': [1, 2, 3, 4, 5],
    'Salary': [5000, 6000, 4500, 7000, 5500]
}

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)


llm = OpenAI()
dl = SmartDatalake([employees_df, salaries_df], config={"llm": llm})
dl.chat("Who gets paid the most?")
```

The above code will return the following:

```
Oh, Olivia gets paid the most.
```

You can find more examples in the [examples](examples) directory.

### ⚡️ Shortcuts

PandasAI also provides a number of shortcuts (beta) to make it easier to ask questions to your data. For example, you can ask PandasAI to `clean_data`, `impute_missing_values`, `generate_features`, `plot_histogram`, and many many more.

```python
# Clean data
df.clean_data()

# Impute missing values
df.impute_missing_values()

# Generate features
df.generate_features()

# Plot histogram
df.plot_histogram(column="gdp")
```

Learn more about the shortcuts [here](https://pandas-ai.readthedocs.io/en/latest/shortcuts/).

## 🔒 Privacy & Security

In order to generate the Python code to run, we take the dataframe head, we randomize it (using random generation for sensitive data and shuffling for non-sensitive data) and send just the head.

Also, if you want to enforce further your privacy you can instantiate PandasAI with `enforce_privacy = True` which will not send the head (but just column names) to the LLM.

## 🤝 Contributing

Contributions are welcome! Please check out the todos below, and feel free to open a pull request.
For more information, please see the [contributing guidelines](CONTRIBUTING.md).

After installing the virtual environment, please remember to install `pre-commit` to be compliant with our standards:

```bash
pre-commit install
```

## Contributors

[![Contributors](https://contrib.rocks/image?repo=gventuri/pandas-ai)](https://github.com/gventuri/pandas-ai/graphs/contributors)

## 📜 License

PandasAI is licensed under the MIT License. See the LICENSE file for more details.

## Acknowledgements

- This project is based on the [pandas](https://github.com/pandas-dev/pandas) library by independent contributors, but it's in no way affiliated with the pandas project.
- This project is meant to be used as a tool for data exploration and analysis, and it's not meant to be used for production purposes. Please use it responsibly.

