Metadata-Version: 2.4
Name: utilsds
Version: 1.1.8
Summary: Solution for DS Team
Author: DS Team
Author-email: ds@sts.pl
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: pandas>=2.2.2
Requires-Dist: numpy<=2.1.0,>=1.26.4
Requires-Dist: scikit-learn>=1.4.2
Requires-Dist: seaborn>=0.13.2
Requires-Dist: matplotlib>=3.9.0
Requires-Dist: google-cloud-bigquery>=3.22.0
Requires-Dist: google-cloud-bigquery-storage>=2.0.0
Requires-Dist: google-cloud-storage>=2.16.0
Requires-Dist: google-cloud-aiplatform>=1.51.0
Requires-Dist: scipy>=1.13.0
Requires-Dist: hyperopt>=0.2.7
Requires-Dist: tqdm>=4.66.4
Requires-Dist: xgboost>=1.7.6
Requires-Dist: lightgbm>=4.0.0
Requires-Dist: yellowbrick>=1.5
Requires-Dist: cloudpickle>=2.3.0
Requires-Dist: db-dtypes>=1.4.0
Requires-Dist: pygments>=2.19.1
Requires-Dist: shap>=0.41.0
Requires-Dist: numba>=0.61.0
Requires-Dist: jinja2>=3.1.3
Requires-Dist: setuptools>=75.8.0
Requires-Dist: evidently<0.6.7,>=0.4.39
Requires-Dist: ipython<=8.31.0
Requires-Dist: duckdb>=1.2.1
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# utilsds

Utilsds is a library that includes classes and functions used in data science projects such as:

- **algorithm**:
  - `Algorithm`: Base class for fitting, training, and getting hyperparameters of machine learning models.

- **data_ops**:
  - `DataOperations`: Handle data operations locally and with Google Cloud services (BigQuery and Cloud Storage).
  - BigQuery operations:
    - `load_bq_data`: Load data from tables, views, and SQL files.
    - `save_bq_view`, `save_bq_table`: Save views and tables.
    - `load_bq_procedure`: Execute stored procedures.
    - `load_bq_details`: Get table/view details and schema.
    - `delete_bq_data`: Delete data with safety confirmations.
    - `dry_run`: Perform dry runs to estimate query costs.
  - Cloud Storage operations:
    - `save_gcs_bucket`: Create buckets.
    - `save_gcs_file`, `load_gcs_file`: Save and load files (.pkl, .json, .csv, .html, .sql).
  - Local file operations:
    - `save_local_file`, `load_local_file`: Save and load files (.pkl, .json, .csv, .html, .sql).

- **data_processing**:
  - `SkewnessTransformer`: Transform skewed data using various methods (IHS, neglog, Yeo-Johnson, quantile).
  - `NullReplacer`: Replace null values in specified columns with configurable strategies.
  - `ColumnDropper`: Drop specified columns from a DataFrame.
  - `OutliersCleaner`: Clean outliers by clipping values outside specified percentile ranges.
  - `CategoricalMapper`: Map values in categorical columns according to a specified mapping scheme.
  - `NumericalMapper`: Convert numerical columns to categorical by binning.
  - `Encoder`: One-hot encode categorical columns in the data.
  - `Normalizer`: Normalize numerical columns using a provided scaler.

- **data_split**:
  - `train_test_validation_split`: Split data into training, testing, and validation sets.
  - `resample_X_y`: resample train data and target column.

- **ds_statistics**:
  - `test_kruskal_wallis`: Perform the Kruskal-Wallis statistical test.
  - `test_agosto_pearsona`: Test for normality using D'Agostino-Pearson test.

- **evaluate**:
  - `ModelEvaluator`: Evaluate models and generate plots for diagnostics.
  - `ShapExplainer`: Explain model predictions using SHAP values.

- **experiments**:
  - `VertexExperiment`: Manage experiments with Vertex AI.

- **hyperopt**:
  - `Hyperopt`: Optimize hyperparameters using Hyperopt.

- **metrics**:
  - `Metrics`: Calculate metrics for both classification and regression models.

- **modeling**:
  - `Modeling`: Manage modeling, metrics, and logging with Vertex AI.

- **Supervised**:
  - `LazyClassifier`: A classifier that automatically trains and evaluates multiple models.
  - `LazyRegressor`: A regressor that automatically trains and evaluates multiple models.
  - `get_card_split`: Function to split data into card-like groups.
  - `adjusted_rsquared`: Calculate adjusted R-squared for regression models.

- **visualization**:
  - `MetricsPlot`: Compare metrics for different parameter values.
  - `Radar`: Create radar plots for visualizing data.
  - `cluster_characteristics`: Analyze cluster characteristics.
  - `comparison_density`: Compare density distributions.
  - `elbow_visualisation`: Visualize the elbow method for clustering.
  - `describe_clusters_metrics`: Describe metrics for clusters.
  - `category_null_variables`: Visualize null variables in categorical data.
  - `normal_distr_plots`: Visualize normal distribution plots.
  - `distplot_limitations`: Visualize limitations of distplot.
  - `boxplot_limitations`: Visualize limitations of boxplot.
  - `violinplot_limitations`: Visualize limitations of violinplot.
  - `countplot_limitations`: Visualize limitations of countplot.
  - `categorical_variable_perc`: Visualize percentage of categorical variables.
  - `spearman_correlation`: Visualize spearman correlation.
  - `calculate_crammers_v`: Calculate Crammer's V.

- **what_if_streamlit**:
  - `ShapSaver`: Save SHAP explainer components for lazy loading in what-if analysis.
  - `ColumnMetadataGenerator`: Generate column metadata from a DataFrame or CSV file.

- **monitoring**:
  - `mapping`: Create column mapping from configuration file for Evidently.
  - `test_data`: Test data for issues using Evidently test suites.
  - `check_data_drift`: Check data for drift using Evidently metrics.
  - `send_email_with_table`: Send email notifications with HTML tables for monitoring alerts.
