Metadata-Version: 2.4
Name: tubular
Version: 1.4.7
Summary: Package to perform pre processing steps for machine learning models
Author-email: Allianz UK Data Science Team <datasciencepackages@allianz.co.uk>
License: BSD 3-Clause License
        
        Copyright (c) 2021, Liverpool Victoria General Insurance Group.
        All rights reserved.
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this
           list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice,
           this list of conditions and the following disclaimer in the documentation
           and/or other materials provided with the distribution.
        
        3. Neither the name of the copyright holder nor the names of its
           contributors may be used to endorse or promote products derived from
           this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Project-URL: Documentation, https://tubular.readthedocs.io/en/latest/index.html
Project-URL: Repository, https://github.com/azukds/tubular
Project-URL: Issues, https://github.com/azukds/tubular/issues
Project-URL: Changelog, https://github.com/azukds/tubular/CHANGELOG.md
Keywords: data science,feature engineering,data transforms,pipeline,sklearn,machine learning,ML,DS
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: BSD License
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5.0
Requires-Dist: scikit-learn>=1.2.0
Requires-Dist: narwhals>=1.31.0
Requires-Dist: polars<1.32.0
Requires-Dist: beartype>=0.19.0
Requires-Dist: typing-extensions>=4.5.0
Provides-Extra: dev
Requires-Dist: test-aide>=0.1.0; extra == "dev"
Requires-Dist: pytest>=5.4.1; extra == "dev"
Requires-Dist: pytest-mock>=3.5.1; extra == "dev"
Requires-Dist: pyarrow>=17.0.0; extra == "dev"
Requires-Dist: pytest-cov<=2.10.1; extra == "dev"
Requires-Dist: pre-commit<=6.1.1; extra == "dev"
Requires-Dist: ruff==0.2.2; extra == "dev"
Dynamic: license-file

<p align="center">
  <img src="https://github.com/azukds/tubular/raw/main/logo.png">
</p>

Tubular pre-processing for machine learning!

----

![PyPI](https://img.shields.io/pypi/v/tubular?color=success&style=flat)
![Read the Docs](https://img.shields.io/readthedocs/tubular)
![GitHub](https://img.shields.io/github/license/azukds/tubular)
![GitHub last commit](https://img.shields.io/github/last-commit/azukds/tubular)
![GitHub issues](https://img.shields.io/github/issues/azukds/tubular)
![Build](https://github.com/azukds/tubular/actions/workflows/python-package.yml/badge.svg?branch=main)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/azukds/tubular/HEAD?labpath=examples)

`tubular` implements pre-processing steps for tabular data commonly used in machine learning pipelines.

The transformers are compatible with [scikit-learn](https://scikit-learn.org/) [Pipelines](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html). Each has a `transform` method to apply the pre-processing step to data and a `fit` method to learn the relevant information from the data, if applicable.

The transformers in `tubular` work with data in [pandas](https://pandas.pydata.org/) [DataFrames](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).

There are a variety of transformers to assist with;

- capping
- dates
- imputation
- mapping
- categorical encoding
- numeric operations

Here is a simple example of applying capping to two columns;

```python
from tubular.capping import CappingTransformer
import pandas as pd
from sklearn.datasets import fetch_california_housing

# load the california housing dataset
cali = fetch_california_housing()
X = pd.DataFrame(cali['data'], columns=cali['feature_names'])

# initialise a capping transformer for 2 columns
capper = CappingTransformer(capping_values = {'AveOccup': [0, 10], 'HouseAge': [0, 50]})

# transform the data
X_capped = capper.transform(X)
```

## Installation

The easiest way to get `tubular` is directly from [pypi](https://pypi.org/project/tubular/) with;

 `pip install tubular`

## Documentation

The documentation for `tubular` can be found on [readthedocs](https://tubular.readthedocs.io/en/latest/).

Instructions for building the docs locally can be found in [docs/README](https://github.com/azukds/tubular/blob/main/docs/README.md).

## Examples

To help get started there are example notebooks in the [examples](https://github.com/azukds/tubular/tree/main/examples) folder in the repo that show how to use each transformer.

To open the example notebooks in [binder](https://mybinder.org/) click [here](https://mybinder.org/v2/gh/azukds/tubular/HEAD?labpath=examples) or click on the `launch binder` shield above and then click on the directory button in the side bar to the left to navigate to the specific notebook.

## Issues

For bugs and feature requests please open an [issue](https://github.com/azukds/tubular/issues).

## Build and test

The test framework we are using for this project is [pytest](https://docs.pytest.org/en/stable/). To build the package locally and run the tests follow the steps below.

First clone the repo and move to the root directory;

```shell
git clone https://github.com/azukds/tubular.git
cd tubular
```

Next install `tubular` and development dependencies;

```shell
pip install . -r requirements-dev.txt
```

Finally run the test suite with `pytest`;

```shell
pytest
```

## Contribute

`tubular` is under active development, we're super excited if you're interested in contributing! 

See the [CONTRIBUTING](https://github.com/azukds/tubular/blob/main/CONTRIBUTING.rst) file for the full details of our working practices.
