Metadata-Version: 2.1
Name: fairness-datasets
Version: 0.2.2
Summary: PyTorch dataset wrapper for the
Keywords: PyTorch,Dataset,Fairness,Adult,Census Income,Law School
Author-email: David Boetius <david.boetius@uni-konstanz.de>
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Dist: torch >=1.9, <3.0
Requires-Dist: numpy >=1.20, <2.0
Requires-Dist: pandas >=2.1, <3.0
Requires-Dist: xlrd >=2.0.1, <3.0
Requires-Dist: requests >=2.25, <3.0
Requires-Dist: flit==3.9.0 ; extra == "develop"
Requires-Dist: black==23.7.0 ; extra == "develop"
Requires-Dist: pytest >=7.4, <8.0 ; extra == "test"
Requires-Dist: nox==2023.4.22 ; extra == "test"
Project-URL: Bug Tracker, https://github.com/cherrywoods/fairness-datasets/issues
Project-URL: Homepage, https://github.com/cherrywoods/fairness-datasets
Project-URL: Repository, https://github.com/cherrywoods/fairness-datasets.git
Provides-Extra: develop
Provides-Extra: test

# fairness-datasets
[![PyPI version](https://badge.fury.io/py/fairness-datasets.svg)](https://badge.fury.io/py/fairness-datasets)

PyTorch dataset wrappers for the several popular datasets from 
fair machine learning research.

The following datasets are wrapped:
 - [Adult (Census Income)](https://archive.ics.uci.edu/dataset/2/adult).
 - [Default](https://archive.ics.uci.edu/dataset/350/default+of+credit+card+clients)
 - [Law School](https://eric.ed.gov/?id=ED469370) (data from [here](https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_Pandas_Case_Study))
 - [SouthGerman](https://archive.ics.uci.edu/dataset/573/south+german+credit+update)

## Installation
```shell
pip install fairness-datasets
```

## Basic Usage
```python
from fairnessdatasets import Adult

# load (if necessary, download) the Adult training dataset 
train_set = Adult(root="datasets", download=True)
# load the test set
test_set = Adult(root="datasets", train=False, download=True)

inputs, target = train_set[0]  # retrieve the first sample of the training set

# iterate over the training set
for inputs, target in iter(train_set):
    ...  # Do something with a single sample

# use a PyTorch data loader
from torch.utils.data import DataLoader

loader = DataLoader(test_set, batch_size=32, shuffle=True)
for epoch in range(100):
    for inputs, targets in iter(loader):
        ...  # Do something with a batch of samples
```
You can use `Adult(..., raw=True)` to turn off the one-hot encoding
and z-score normalization applied by the `Adult` class by default.

The remaining dataset classes can be used in the same way as `Adult`.
However, these datasets don't come with a fixed train/test split, 
so that the dataset instances always contain all data.
To create a train/test split, use
```python
from fairnessdatasets import Default
from torch.utils.data import random_split

dataset = Default(root="datasets", download=True)

rng = torch.Generator().manual_seed(42)  # for reproducible results
train_set, test_set = random_split(dataset, [0.7, 0.3], generator=rng)
```

## Advanced Usage

Turn off status messages while downloading the dataset:
```python
Adult(root=..., output_fn=None)
```

Use the `logging` module for logging status messages while downloading the
dataset instead of placing the status messages on `sys.stdout`.
```python
import logging

Adult(root=..., output_fn=logging.info)
```

