Metadata-Version: 2.4
Name: datacompy
Version: 0.17.1
Summary: Dataframe comparison in Python
Author: Ian Robertson, Dan Coates
Author-email: Faisal Dosani <faisal.dosani@capitalone.com>
Maintainer-email: Faisal Dosani <faisal.dosani@capitalone.com>, Jacob Dawang <jacob.dawang@capitalone.com>, Raymond Haffar <raymond.haffar@capitalone.com>
License: Apache Software License
Project-URL: Homepage, https://github.com/capitalone/datacompy
Project-URL: Documentation, https://capitalone.github.io/datacompy/
Project-URL: Repository, https://github.com/capitalone/datacompy.git
Project-URL: Bug Tracker, https://github.com/capitalone/datacompy/issues
Project-URL: Source Code, https://github.com/capitalone/datacompy
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10.0
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas<=2.3.1,>=0.25.0
Requires-Dist: numpy<=2.2.6,>=1.22.0
Requires-Dist: ordered-set<=4.1.0,>=4.0.2
Requires-Dist: polars[pandas]<=1.31.0,>=0.20.4
Requires-Dist: Jinja2>=3.0.0
Provides-Extra: fugue
Requires-Dist: fugue[dask,duckdb,ray]<=0.9.1,>=0.8.7; extra == "fugue"
Provides-Extra: spark
Requires-Dist: pyspark[connect]<=3.5.6,>=3.1.1; python_version < "3.11" and extra == "spark"
Requires-Dist: pyspark[connect]<=3.5.6,>=3.4; python_version >= "3.11" and extra == "spark"
Provides-Extra: snowflake
Requires-Dist: snowflake-connector-python; extra == "snowflake"
Requires-Dist: snowflake-snowpark-python; extra == "snowflake"
Provides-Extra: docs
Requires-Dist: sphinx; extra == "docs"
Requires-Dist: furo; extra == "docs"
Requires-Dist: myst-parser; extra == "docs"
Provides-Extra: tests
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pytest-cov; extra == "tests"
Provides-Extra: tests-spark
Requires-Dist: pytest; extra == "tests-spark"
Requires-Dist: pytest-cov; extra == "tests-spark"
Requires-Dist: pytest-spark; extra == "tests-spark"
Provides-Extra: tests-snowflake
Requires-Dist: snowflake-snowpark-python[localtest]; extra == "tests-snowflake"
Provides-Extra: qa
Requires-Dist: pre-commit; extra == "qa"
Requires-Dist: ruff==0.5.7; extra == "qa"
Requires-Dist: mypy; extra == "qa"
Requires-Dist: pandas-stubs; extra == "qa"
Provides-Extra: build
Requires-Dist: build; extra == "build"
Requires-Dist: twine; extra == "build"
Requires-Dist: wheel; extra == "build"
Provides-Extra: edgetest
Requires-Dist: edgetest; extra == "edgetest"
Requires-Dist: edgetest-conda; extra == "edgetest"
Provides-Extra: dev-no-snowflake
Requires-Dist: datacompy[fugue]; extra == "dev-no-snowflake"
Requires-Dist: datacompy[spark]; extra == "dev-no-snowflake"
Requires-Dist: datacompy[docs]; extra == "dev-no-snowflake"
Requires-Dist: datacompy[tests]; extra == "dev-no-snowflake"
Requires-Dist: datacompy[tests-spark]; extra == "dev-no-snowflake"
Requires-Dist: datacompy[qa]; extra == "dev-no-snowflake"
Requires-Dist: datacompy[build]; extra == "dev-no-snowflake"
Provides-Extra: dev
Requires-Dist: datacompy[fugue]; extra == "dev"
Requires-Dist: datacompy[spark]; extra == "dev"
Requires-Dist: datacompy[snowflake]; extra == "dev"
Requires-Dist: datacompy[docs]; extra == "dev"
Requires-Dist: datacompy[tests]; extra == "dev"
Requires-Dist: datacompy[tests-spark]; extra == "dev"
Requires-Dist: datacompy[tests-snowflake]; extra == "dev"
Requires-Dist: datacompy[qa]; extra == "dev"
Requires-Dist: datacompy[build]; extra == "dev"
Dynamic: license-file

# DataComPy

![PyPI - Python Version](https://img.shields.io/pypi/pyversions/datacompy)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![PyPI version](https://badge.fury.io/py/datacompy.svg)](https://badge.fury.io/py/datacompy)
[![Anaconda-Server Badge](https://anaconda.org/conda-forge/datacompy/badges/version.svg)](https://anaconda.org/conda-forge/datacompy)
![PyPI - Downloads](https://img.shields.io/pypi/dm/datacompy)


DataComPy is a package to compare two DataFrames (or tables) such as Pandas, Spark, Polars, and
even Snowflake. Originally it was created to be something of a replacement
for SAS's ``PROC COMPARE`` for Pandas DataFrames with some more functionality than
just ``Pandas.DataFrame.equals(Pandas.DataFrame)`` (in that it prints out some stats,
and lets you tweak how accurate matches have to be). Supported types include:

- Pandas
- Polars
- Spark
- Snowflake (via snowpark)
- Dask (via Fugue)
- DuckDB (via Fugue)


## Quick Installation

```shell
pip install datacompy
```

or

```shell
conda install datacompy
```

### Installing extras

If you would like to use Spark or any other backends please make sure you install via extras:

```shell
pip install datacompy[spark]
pip install datacompy[fugue]
pip install datacompy[snowflake]

```

### LegacySparkCompare and SparkPandasCompare removal

Starting with v0.17.0, both `LegacySparkCompare` and `SparkPandasCompare` have been removed.


#### Supported versions and dependencies

Different versions of Spark, Pandas, and Python interact differently. Below is a matrix of what we test with.
With the move to Pandas on Spark API and compatability issues with Pandas 2+ we will for the mean time note support Pandas 2
with the Pandas on Spark implementation. Spark plans to support Pandas 2 in [Spark 4](https://issues.apache.org/jira/browse/SPARK-44101)


|             | Spark 3.2.4 | Spark 3.3.4 | Spark 3.4.2 | Spark 3.5.1 |
|-------------|-------------|-------------|-------------|-------------|
| Python 3.10 | ✅           | ✅           | ✅           | ✅           |
| Python 3.11 | ❌           | ❌           | ✅           | ✅           |
| Python 3.12 | ❌           | ❌           | ❌           | ❌           |


|                        | Pandas < 1.5.3 | Pandas >=2.0.0 |
|------------------------|----------------|----------------|
| ``Compare``            | ✅              | ✅              |
| ``SparkSQLCompare``    | ✅              | ✅              |
| Fugue                  | ✅              | ✅              |



> [!NOTE]
> At the current time Python `3.12` is not supported by Spark and also Ray within Fugue.
> If you are using Python `3.12` and above, please note that not all functioanlity will be supported.
> Pandas and Polars support should work fine and are tested.

## Supported backends

- Pandas: ([See documentation](https://capitalone.github.io/datacompy/pandas_usage.html))
- Spark: ([See documentation](https://capitalone.github.io/datacompy/spark_usage.html))
- Polars: ([See documentation](https://capitalone.github.io/datacompy/polars_usage.html))
- Snowflake/Snowpark: ([See documentation](https://capitalone.github.io/datacompy/snowflake_usage.html))
- Fugue is a Python library that provides a unified interface for data processing on Pandas, DuckDB, Polars, Arrow,
  Spark, Dask, Ray, and many other backends. DataComPy integrates with Fugue to provide a simple way to compare data
  across these backends. Please note that Fugue will use the Pandas (Native) logic at its lowest level
  ([See documentation](https://capitalone.github.io/datacompy/fugue_usage.html))

## Contributors

We welcome and appreciate your contributions! Before we can accept any contributions, we ask that you please be sure to
sign the [Contributor License Agreement (CLA)](https://cla-assistant.io/capitalone/datacompy).

This project adheres to the [Open Source Code of Conduct](https://developer.capitalone.com/resources/code-of-conduct/).
By participating, you are expected to honor this code.


## Roadmap

Roadmap details can be found [here](https://github.com/capitalone/datacompy/blob/develop/ROADMAP.rst)
