{% extends "layout.html" %} {% set title = 'skrub: Machine Learning with dataframes' %} {%- block extrahead %} {{ super() }} {# Add here landing-page specific stuff that goes in the header (eg css) #} {%- endblock extrahead %} {% block docs_navbar %} {{ super() }} {# We add the full-width banner below the navbar, as the div there is still full-width (unlike the article) #}

skrub

  • scikit-learn compatible
  • Database preprocessing
  • Heterogeneous Pandas and Polars dataframes (numeric, categorical, dates, text...)
{% endblock docs_navbar %} {% block docs_main %}

Machine learning with dataframes

skrub is a Python library to ease preprocessing and feature engineering for tabular machine learning.
We directly connect database tables to machine learning.

Effortless Tabular Learning

Create strong scikit-learn pipeline baselines effortlessly with TableVectorizer and tabular_pipeline.

{% include "demo_tabular_pipeline.html" %}

Interactive Data Exploration

Explore your dataframes interactively with TableReport.

{% include "demo_table_report_code.html" %}

Try it on your dataset →

Click anywhere on the table

{% include "demo_table_report_generated.html" %}

Powerful Feature Engineering

Encode text and high cardinality categorical data (StringEncoder, TextEncoder, GapEncoder, and MinHashEncoder), or extract features from dates with the DatetimeEncoder.

{% include "demo_gap_encoder.html" %}

Tune arbitrary data wrangling

Inspect it, apply it to new data

Chain an arbitrary set of operations to prepare, transform, assemble multiple tables for machine learning, and then tune the full pipeline, inspect it, or apply it to new data.

Works with any computational or dataframe engine.

{% include "../generated_for_index/pipeline.svg" %}

Discover the skrub DataOps →

Given two input dataframes products_df and baskets_df: (expand for full code) {% include "../generated_for_index/code_block_0.html" %}
{% include "../generated_for_index/products.html" %}
{% include "../generated_for_index/baskets.html" %}
{% include "../generated_for_index/code_block_2.html" %}
{% include "../generated_for_index/parallel_coordinates.html" %}

Our Community

The Skrub project is powered by the efforts of a world-wide community of contributors. Here we display a randomly selected group of 30 contributors.

Try it yourself!

Ready to write less code and get more insights? Dive into skrub now and be part of an emerging community!

{% endblock docs_main %} {%- block footer %} {%- endblock footer %}