Metadata-Version: 2.1
Name: nkululeko
Version: 0.68.3
Summary: Machine learning audio prediction experiments based on templates
Home-page: https://github.com/felixbur/nkululeko
Author: Felix Burkhardt
Author-email: fxburk@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 3 - Alpha
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: audeer
Requires-Dist: audformat
Requires-Dist: audinterface
Requires-Dist: audiofile
Requires-Dist: audiomentations
Requires-Dist: audonnx
Requires-Dist: datasets
Requires-Dist: imageio
Requires-Dist: laion-clap
Requires-Dist: matplotlib
Requires-Dist: numpy
Requires-Dist: opensmile
Requires-Dist: pandas
Requires-Dist: praat-parselmouth
Requires-Dist: pylatex
Requires-Dist: scikit_learn
Requires-Dist: scipy
Requires-Dist: seaborn
Requires-Dist: sounddevice
Requires-Dist: tensorflow
Requires-Dist: tensorflow_hub
Requires-Dist: torch
Requires-Dist: torchvision
Requires-Dist: transformers
Requires-Dist: umap-learn
Requires-Dist: xgboost
Requires-Dist: pylatex

- [Overview](#overview)
  - [Confusion matrix](#confusion-matrix)
  - [Epoch progression](#epoch-progression)
  - [Feature importance](#feature-importance)
  - [Feature distribution](#feature-distribution)
  - [t-SNE plots](#t-sne-plots)
  - [Data distribution](#data-distribution)
  - [Bias checking](#bias-checking)
- [Documentation](#documentation)
- [Installation](#installation)
- [Usage](#usage)
  - [Initialization file](#initialization-file)
  - [Hello World example](#hello-world-example)
  - [Features](#features)
- [License](#license)

 
## Overview
A project to detect speaker characteristics by machine learning experiments with a high-level interface.

The idea is to have a framework (based on e.g. sklearn and torch) that can be used to rapidly and automatically analyse and investigate audio data automatically.

* NEW: Nkululeko now automatically generates PDF reports [sample for EmoDB](meta/images/emodb_report.pdf)
* The latest features can be seen in [the ini-file](./ini_file.md) options that are used to control Nkululeko
* Below is a [Hello World example](#helloworld) that should set you up fastly, also on [Google Colab](https://colab.research.google.com/drive/1GYNBd5cdZQ1QC3Jm58qoeMaJg3UuPhjw?usp=sharing#scrollTo=4G_SjuF9xeQf), and [with Kaggle](https://www.kaggle.com/felixburk/nkululeko-hello-world-example)
* [Here's a blog post on how to set up nkululeko on your computer.](http://blog.syntheticspeech.de/2021/08/30/how-to-set-up-your-first-nkululeko-project/)
* [Here is a slack channel to discuss issues related to nkululeko](https://join.slack.com/t/nkululekoworkspace/shared_invite/zt-1wtvbxtwz-P5YoRJq8whxKSee86ebhJg). Please click the link if interested in contributing.
* [Here's a slide presentation about nkululeko](docs/nkululeko.pdf)
* [Here's a video presentation about nkululeko](https://www.youtube.com/playlist?list=PLRceVavtxLg0y2jiLmpnUfiMtfvkK912D)
* [Here's the 2022 LREC article on nkululeko](http://felix.syntheticspeech.de/publications/Nkululeko_LREC.pdf)

Here are some examples of typical output:

### Confusion matrix
Per default, Nkululeko displays results as a confusion matrix using binning with regression.

<img src="meta/images/conf_mat.png" width="500px"/>

### Epoch progression
The point when overfitting starts can sometimes be seen by looking at the results per epoch:

<img src="meta/images/epoch_progression.png" width="500px"/>

### Feature importance
Using the *explore* interface, Nkululeko analyses the importance of acoustic features:
 
<img src="meta/images/feat_importance.png" width="500px"/>

### Feature distribution
And can show the distribution of specific features per category:

<img src="meta/images/feat_dist.png" width="500px"/>

### t-SNE plots
A t-SNE plot can give you an estimate wether your acoustic features are useful at all:

<img src="meta/images/tsne.png" width="500px"/>

### Data distribution
Sometimes you only want to take a look at your data:

<img src="meta/images/data_plot.png" width="500px"/>

### Bias checking
In cases you might wonder if there's bias in your data. You can try to detect this with automatically estimated speech properties, by visualizing the correlation of target label and predicted labels.

<img src="meta/images/emotion-pesq.png" width="500px"/>

## Documentation
The documentation, along with extensions of installation, usage, INI file format, and examples, can be found [nkululeko.readthedocs.io](https://nkululeko.readthedocs.io).

## Installation

Create and activate a virtual Python environment and simply run
```
pip install nkululeko
```
We excluded some packages from the automatic installation because they might depend on your computer and some of them are only needed in special cases. So if the error
```
module x not found
```
appears, please try
```
pip install x
```
For many packages you will need the missing torch package.
If you don't have a GPU (which is probably true if you don't know what that is), please use
```
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
```
else, you can use the default:
```
pip install torch torchvision torchaudio
```

Some functionalities require extra packages to be installed, which we didn't include automatically:
* the SQUIM model needs a special torch version:
  ```
  pip uninstall -y torch torchvision torchaudio
  pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
  ```
* the spotlight adapter needs spotlight:
  ```
  pip install renumics-spotlight sliceguard 
  ```


Some examples for *ini*-files (which you use to control nkululeko) are in the [tests folder](https://github.com/felixbur/nkululeko/tree/main/tests).


## Usage
Basically, you specify your experiment in an "ini" file (e.g. *experiment.ini*) and then call one of the Nkululeko interfaces to run the experiment like this:
  * ```python -m nkululeko.nkululeko --config experiment.ini```

A basic configuration looks like this:
```
[EXP]
root = ./
name = exp_emodb
[DATA]
databases = ['emodb']
emodb = ./emodb/
emodb.split_strategy = speaker_split
target = emotion
labels = ['anger', 'boredom', 'disgust', 'fear']
[FEATS]
type = ['praat']
[MODEL]
type = svm
[EXPL]
model = tree
plot_tree = True
[PLOT]
combine_per_speaker = mode
```
Read the [Hello World example](#hello-world-example) for initial usage with Emo-DB dataset.

Here is an overview of the interfaces:
* **nkululeko.nkululeko**: do machine learning experiments combining features and learners
* **nkululeko.demo**: demo the current best model on the command line
* **nkululeko.test**: predict a series of files with the current best model
* **nkululeko.explore**: perform data exploration
* **nkululeko.augment**: augment the current training data
* **nkululeko.predict**: predict features like SNR, MOS, arousal/valence, age/gender, with DNN models
* **nkululeko.segment**: segment a database based on VAD (voice activity detection)
* **nkululeko.resample**: check on all sampling rates and change to 16kHz 

There's my [blog](http://blog.syntheticspeech.de/?s=nkululeko) with tutorials:
* [Introduction](http://blog.syntheticspeech.de/2021/08/04/machine-learning-experiment-framework/)
* [Nkulueko FAQ](http://blog.syntheticspeech.de/2022/07/07/nkululeko-faq/)
* [How to set up your first nkululeko project](http://blog.syntheticspeech.de/2021/08/30/how-to-set-up-your-first-nkululeko-project/)
* [Setting up a base nkululeko experiment](http://blog.syntheticspeech.de/2021/10/05/setting-up-a-base-nkululeko-experiment/)
* [How to import a database](http://blog.syntheticspeech.de/2022/01/27/nkululeko-how-to-import-a-database/) 
* [Comparing classifiers and features](http://blog.syntheticspeech.de/2021/10/05/nkululeko-comparing-classifiers-and-features/)
* [Use Praat features](http://blog.syntheticspeech.de/2022/06/27/how-to-use-selected-features-from-praat-with-nkululeko/)
* [Combine feature sets](http://blog.syntheticspeech.de/2022/06/30/how-to-combine-feature-sets-with-nkululeko/)
* [Classifying continuous variables](http://blog.syntheticspeech.de/2022/01/26/nkululeko-classifying-continuous-variables/) 
* [Try out / demo a trained model](http://blog.syntheticspeech.de/2022/01/24/nkululeko-try-out-demo-a-trained-model/) 
* [Perform cross database experiments](http://blog.syntheticspeech.de/2021/10/05/nkululeko-perform-cross-database-experiments/)
* [Meta parameter optimization](http://blog.syntheticspeech.de/2021/09/03/perform-optimization-with-nkululeko/)
* [How to set up wav2vec embedding](http://blog.syntheticspeech.de/2021/12/03/how-to-set-up-wav2vec-embedding-for-nkululeko/)
* [How to soft-label a database](http://blog.syntheticspeech.de/2022/01/24/how-to-soft-label-a-database-with-nkululeko/) 
* [Re-generate the progressing confusion matrix animation wit a different framerate](demos/plot_faster_anim.py)
* [How to limit/filter a dataset](http://blog.syntheticspeech.de/2022/02/22/how-to-limit-a-dataset-with-nkululeko/)
* [Specifying database disk location](http://blog.syntheticspeech.de/2022/02/21/specifying-database-disk-location-with-nkululeko/) 
* [Add dropout with MLP models](http://blog.syntheticspeech.de/2022/02/25/adding-dropout-to-mlp-models-with-nkululeko/)
* [Do cross-validation](http://blog.syntheticspeech.de/2022/03/23/how-to-do-cross-validation-with-nkululeko/)
* [Combine predictions per speaker](http://blog.syntheticspeech.de/2022/03/24/how-to-combine-predictions-per-speaker-with-nkululeko/)
* [Run multiple experiments in one go](http://blog.syntheticspeech.de/2022/03/28/how-to-run-multiple-experiments-in-one-go-with-nkululeko/)
* [Compare several MLP layer layouts with each other](http://blog.syntheticspeech.de/2022/04/11/how-to-compare-several-mlp-layer-layouts-with-each-other/)
* [Import features from outside the software](http://blog.syntheticspeech.de/2022/10/18/how-to-import-features-from-outside-the-nkululeko-software/)
* [Explore feature importance](http://blog.syntheticspeech.de/2023/02/20/nkululeko-show-feature-importance/)
* [Plot distributions for feature values](http://blog.syntheticspeech.de/2023/02/16/nkululeko-how-to-plot-distributions-of-feature-values/)
* [Show feature importance](http://blog.syntheticspeech.de/2023/02/20/nkululeko-show-feature-importance/)
* [Augment the training set](http://blog.syntheticspeech.de/2023/03/13/nkululeko-how-to-augment-the-training-set/)
* [Visualize clusters of acoustic features](http://blog.syntheticspeech.de/2023/04/20/nkululeko-visualize-clusters-of-your-acoustic-features/)
* [Visualize your data distribution](http://blog.syntheticspeech.de/2023/05/11/nkululeko-how-to-visualize-your-data-distribution/)
* [Check your dataset](http://blog.syntheticspeech.de/2023/07/11/nkululeko-check-your-dataset/) 
* [Segmenting a database](http://blog.syntheticspeech.de/2023/07/14/nkululeko-segmenting-a-database/)
* [Predict new labels for your data from public models and check bias](http://blog.syntheticspeech.de/2023/08/16/nkululeko-how-to-predict-labels-for-your-data-from-existing-models-and-check-them/)
* [Resample](http://blog.syntheticspeech.de/2023/08/31/how-to-fix-different-sampling-rates-in-a-dataset-with-nkululeko/)
* [Get some statistics on correlation and effect-size](http://blog.syntheticspeech.de/2023/09/05/nkululeko-get-some-statistics-on-correlation-and-effect-size/)
* [Generate a latex / pdf report](http://blog.syntheticspeech.de/2023/09/26/nkululeko-generate-a-latex-pdf-report/) 
* [Inspect your data with Spotlight](http://blog.syntheticspeech.de/2023/10/31/nkululeko-inspect-your-data-with-spotlight/)
* [Automatically stratify your split sets](http://blog.syntheticspeech.de/2023/11/07/nkululeko-automatically-stratify-your-split-sets/)
  
The framework is targeted at the speech domain and supports experiments where different classifiers are combined with different feature extractors.

Here's a rough UML-like sketch of the framework (and [here's the real one done with pyreverse](meta/images/classes.png)).
![sketch](meta/images/class_diagram.png)


Currently, the following linear classifiers are implemented (integrated from sklearn):
* SVM, SVR, XGB, XGR, Tree, Tree_regressor, KNN, KNN_regressor, NaiveBayes, GMM
  and the following ANNs
* MLP, CNN (tbd)

Here's [an animation that shows the progress of classification done with nkululeko](https://youtu.be/6Y0M382GjvM)

### Initialization file
You could 
* use a generic main python file (like my_experiment.py), 
* adapt the path to your nkululeko src 
* and then adapt an .ini file (again fitting at least the paths to src and data)
  
Here's [an overview of the ini-file options](./ini_file.md)

### <a name="helloworld">Hello World example</a>
* NEW: [Here's a Google colab that runs this example out-of-the-box](https://colab.research.google.com/drive/1GYNBd5cdZQ1QC3Jm58qoeMaJg3UuPhjw?usp=sharing#scrollTo=4G_SjuF9xeQf), and here is the same [with Kaggle](https://www.kaggle.com/felixburk/nkululeko-hello-world-example)
* [I made a video to show you how to do this on Windows](https://www.youtube.com/playlist?list=PLRceVavtxLg0y2jiLmpnUfiMtfvkK912D)
* Set up Python on your computer, version >= 3.8
* Open a terminal/commandline/console window
* Test python by typing ```python```, python should start with version >3 (NOT 2!). You can leave the Python Interpreter by typing *exit()*
* Create a folder on your computer for this example, let's call it `nkulu_work`
* Get a copy of the [Berlin emodb in audformat](https://zenodo.org/records/7447302/files/emodb.zip?download=1) and unpack inside the folder you just created (`nkulu_work`)
* Make sure the folder is called "emodb" and does contain the database files directly (not box-in-a-box)
* Also, in the `nkulu_work` folder: 
  * Create a Python environment
    * ```python -m venv venv```
  * Then, activate it:
    * under Linux / mac
      * ```source venv/bin/activate```
    * under Windows
      * ```venv\Scripts\activate.bat```
    * if that worked, you should see a ```(venv)``` in front of your prompt
  * Install the required packages in your environment
    * ```pip install nkululeko```
    * Repeat until all error messages vanished (or fix them, or try to ignore them)...
* Now you should have two folders in your *nkulu_work* folder:
  * *emodb* and *venv*
* Download a copy of the file [exp_emodb.ini](meta/demos/exp_emodb.ini) to the current working directory (```nkulu_work```)
* Run the demo
  * ```python -m nkululeko.nkululeko --config exp_emodb.ini```
* Find the results in the newly created folder exp_emodb 
  * Inspect ```exp_emodb/images/run_0/emodb_xgb_os_0_000_cnf.png```
  * This is the main result of you experiment: a confusion matrix for the emodb emotional categories
* Inspect and play around with the [demo configuration file](meta/demos/exp_emodb.ini) that defined your experiment, then re-run.
* There are many ways to experiment with different classifiers and acoustic features sets, [all described here](https://github.com/felixbur/nkululeko/blob/main/ini_file.md)
  
### Features
* Classifiers: Naive Bayes, KNN, Tree, XGBoost, SVM, MLP
* Feature extractors: Praat, Opensmile, openXBOW BoAW, TRILL embeddings, Wav2vec2 embeddings, audModel embeddings, ...
* Feature scaling
* Label encoding
* Binning (continuous to categorical)
* Online demo interface for trained models 

## License
Nkululeko can be used under the [MIT license](https://choosealicense.com/licenses/mit/)
If you use it, please mention the Nkululeko paper

F. Burkhardt, Johannes Wagner, Hagen Wierstorf, Florian Eyben and Björn Schuller: Nkululeko: A Tool For Rapid Speaker Characteristics Detection, Proc. Proc. LREC, 2022


```
@inproceedings{Burkhardt:lrec2022,
   title = {Nkululeko: A Tool For Rapid Speaker Characteristics Detection},
   author = {Felix Burkhardt and Johannes Wagner and Hagen Wierstorf and Florian Eyben and Björn Schuller},
   isbn = {9791095546726},
   journal = {2022 Language Resources and Evaluation Conference, LREC 2022},
   keywords = {machine learning,speaker characteristics,tools},
   pages = {1925-1932},
   publisher = {European Language Resources Association (ELRA)},
   year = {2022},
}
```

Changelog
=========

Version 0.68.3
--------------
* Feinberg Praat scripts ignore error and log filename

Version 0.68.2
--------------
* column names in datasets are now configurable

Version 0.68.1
--------------
* added error message on file to praat extraction

Version 0.68.0
--------------
* added stratification framework for split balancing

Version 0.67.0
--------------
* added first version of spotlight integration

Version 0.66.13
---------------
* small changes related to github worker

Version 0.66.12
---------------
* fixed bug that prevented Praat features to be selected 
  
Version 0.66.11
---------------
* removed torch from automatic install. depends on cpu/gpu machine

Version 0.66.10
---------------
* Removed print statements from feats_wav2vec2

Version 0.66.9
--------------
* Version that should install without requiring opensmile which seems not to be supported by all Apple processors (arm CPU (Apple M1))

Version 0.66.8
--------------
* forgot __init__.py in reporting module

Version 0.66.7
--------------
* minor changes to experiment class

Version 0.66.6
--------------
* minor cosmetics

Version 0.66.5
--------------
* Latex report now with images

Version 0.66.4
--------------
* Pypi version mixup

Version 0.66.3
--------------
* made path to PDF output relative to experiment root

Version 0.66.2
--------------
* enabled data-pathes with quotes 
* enabled missing category labels
* used tgdm for progress display

Version 0.66.1
--------------
* start on the latex report framework

Version 0.66.0
--------------
* added speechbrain speakerID embeddings 
  
Version 0.65.9
--------------
* added a filter that ensures that the labels have the same size as the features

Version 0.65.8
--------------
* changed default behaviour of resampler to "keep original files"

Version 0.65.7
--------------
* more databases and force wav while resampling

Version 0.65.6
--------------
* minor catch for seaborn in plots

Version 0.65.5
--------------
* added fill_na in plot effect size

Version 0.65.4
--------------
* added datasets to distribution
* changes in wav2vec2

Version 0.65.3
--------------
* various bugfixes

Version 0.65.2
--------------
* fixed bug in dataset.csv that prevented correct paths for relative files
* fixed bug in export module concerning new file directory

Version 0.65.1
--------------
* small enhancements with transformer features

Version 0.65.0
--------------
* introduced export module

Version 0.64.4
--------------
* added num_speakers for reloaded data
* re-formatted all with black

Version 0.64.3
--------------
* added number of speakers shown after data load

Version 0.64.2
--------------
* added __init__.py for submodules

Version 0.64.1
--------------
* fix error on csv

Version 0.64.0
--------------
* added bin_reals
* added statistics for effect size and correlation to plots

Version 0.63.4
--------------
* fixed bug in split selection

Version 0.63.3
--------------
* Introduced data.audio_path


Version 0.63.2
--------------
* re-introduced min and max_length for silero segmenatation

Version 0.63.1
--------------
* fixed bug in resample

Version 0.63.0
--------------
* added wavlm model
* added error on filename for models

Version 0.62.1
--------------
* added min and max_length for silero segmenatation

Version 0.62.0
--------------
* fixed segment silero bug
* added all Wav2vec2 models
* added resampler module
* added error on file for embeddings

Version 0.61.0
--------------
* added HUBERT embeddings
  
Version 0.60.0
--------------
* some bugfixes
* new package structure
* fixed wav2vec2 bugs
* removed "cross_data" strategy 


Version 0.59.1
--------------
* bugfix, after fresh install, it seems some libraries have changed
* added no_warnings
* changed print() to util.debug()
* added progress to opensmile extract
  
Version 0.59.0
--------------
* introduced SQUIM features
* added SDR predict
* added STOI predict

Version 0.58.0
--------------
* added dominance predict
* added MOS predict 
* added PESQ predict 

Version 0.57.0
--------------
* renamed autopredict predict
* added arousal autopredict
* added valence autopredict 


Version 0.56.0
--------------
* added autopredict module
* added snr as feature extractor
* added gender autopredict
* added age autopredict
* added snr autopredict

Version 0.55.1
--------------
* changed error message in plot class

Version 0.55.0
--------------
* added segmentation module

Version 0.54.0
--------------
* added audeering public age and gender model embeddings and age and gender predictions

Version 0.53.0
--------------
* added file checks: size in bytes and voice activity detection with silero

Version 0.52.1
--------------
* bugfix: min/max duration_of_sample was not working

Version 0.52.0
--------------
* added flexible value distribution plots

Version 0.51.0
--------------
* added datafilter

Version 0.50.1
--------------
* added caller information for debug and error messages in Util

Version 0.50.0
--------------
* removed loso and added pre-selected logo (leave-one-group-out), aka folds

Version 0.49.1
--------------
* bugfix: samples selection for augmentation didn't work

Version 0.49.0
--------------
* added random-splicing

Version 0.48.1
--------------
* bugfix: database object was not loaded when dataframe was reused

Version 0.48.0
--------------
* enabled specific feature selection for praat and opensmile features

Version 0.47.1
--------------
* enabled feature storage format csv for opensmile features

Version 0.47.0
--------------
* added praat speech rate features

Version 0.46.0
--------------
* added warnings for non-existent parameters
* added sample selection for scatter plotting

Version 0.45.4
--------------
* added version attribute to setup.cfg

Version 0.45.4
--------------
* added __version__ attribute


Version 0.44.1
--------------
* bugfixing: feature importance: https://github.com/felixbur/nkululeko/issues/23
* bugfixing: loading csv database with filewise index https://github.com/felixbur/nkululeko/issues/24 

Version 0.45.2
--------------
* bugfix: sample_selection in EXPL was required wrongly

Version 0.45.2
--------------
* added sample_selection for sample distribution plots

Version 0.45.1
--------------
* fixed dataframe.append bug

Version 0.45.0
--------------
* added auddim as features
* added FEATS store_format
* added device use to feat_audmodel

Version 0.44.1
--------------
* bugfixes

Version 0.44.0
--------------
* added scatter functions: tsne, pca, umap

Version 0.43.7
--------------
* added clap features

Version 0.43.6
--------------
* small bugs


Version 0.43.5
--------------
* because of difficulties with numba and audiomentations importing audiomentations only when augmenting

Version 0.43.4
--------------
* added error when experiment type and predictor don't match

Version 0.43.3
--------------
* fixed further bugs and added augmentation to the test runs

Version 0.43.2
--------------
* fixed a bug when running continuous variable as classification problem

Version 0.43.1
--------------
* fixed test_runs

Version 0.43.0
--------------
* added augmentation module based on audiomentation

Version 0.42.0
--------------
* age labels should now be detected in databases

Version 0.41.0
--------------
* added feature tree plot

Version 0.40.1
--------------
* fixed a bug: additional test database was not label encoded

Version 0.40.0
--------------
* added EXPL section and first functionality
* added test module (for test databases)

Version 0.39.0
--------------
* added feature distribution plots
* added  plot format

Version 0.38.3
--------------
* added demo mode with list argument

Version 0.38.2
--------------
* fixed a bug concerned with "no_reuse" evaluation

Version 0.38.1
--------------
* demo mode with file argument

Version 0.38.0
--------------
* fixed demo mode

Version 0.37.2
--------------
* mainly replaced pd.append with pd.concat


Version 0.37.1
--------------
* fixed bug preventing praat feature extraction to work

Version 0.37.0
--------------
* fixed bug cvs import not detecting multiindex 

Version 0.36.3
--------------
* published as a pypi module

Version 0.36.0
--------------
* added entry nkululeko.py script


Version 0.35.0
--------------
* fixed bug that prevented scaling (normalization)

Version 0.34.2
--------------
* smaller bug fixed concerning the loss_string

Version 0.34.1
--------------
* smaller bug fixes and tried Soft_f1 loss


Version 0.34.0
--------------
* smaller bug fixes and debug ouputs

Version 0.33.0
--------------
* added GMM as a model type

Version 0.32.0
--------------
* added audmodel embeddings as features

Version 0.31.0
--------------
* added models: tree and tree_reg
  
Version 0.30.0
--------------
* added models: bayes, knn and knn_reg

Version 0.29.2
--------------
* fixed hello world example


Version 0.29.1
--------------
* bug fix for 0.29


Version 0.29.0
--------------
* added a new FeatureExtractor class to import external data

Version 0.28.2
--------------
* removed some Pandas warnings
* added no_reuse function to database.load()

Version 0.28.1
--------------
* with database.value_counts show only the data that is actually used


Version 0.28.0
--------------
* made "label_data" configuration automatic and added "label_result"


Version 0.27.0
--------------
* added "label_data" configuration to label data with trained model (so now there can be train, dev and test set)

Version 0.26.1
--------------
* Fixed some bugs caused by the multitude of feature sets
* Added possibilty to distinguish between absolut or relative pathes in csv datasets

Version 0.26.0
--------------
* added the rename_speakers funcionality to prevent identical speaker names in datasets

Version 0.25.1
--------------
* fixed bug that no features were chosen if not selected

Version 0.25.0
--------------
* made selectable features universal for feature sets

Version 0.24.0
--------------
* added multiple feature sets (will simply be concatenated)

Version 0.23.0
--------------
* added selectable features for Praat interface

Version 0.22.0
--------------
* added David R. Feinberg's Praat features, praise also to parselmouth

Version 0.21.0
--------------

* Revoked 0.20.0
* Added support for only_test = True, to enable later testing of trained models with new test data

Version 0.20.0
--------------

* implemented reuse of trained and saved models

Version 0.19.0
--------------

* added "max_duration_of_sample" for datasets


Version 0.18.6
--------------

* added support for learning and dropout rate as argument


Version 0.18.5
--------------

* added support for epoch number as argument
  
Version 0.18.4
--------------

* added support for ANN layers as arguments

Version 0.18.3
--------------

* added reuse of test and train file sets
* added parameter to scale continous target values: target_divide_by


Version 0.18.2
--------------

* added preference of local dataset specs to global ones
  
Version 0.18.1
--------------

* added regression value display for confusion matrices

Version 0.18.0
--------------

* added leave one speaker group out

Version 0.17.2
--------------

* fixed scaler, added robust



Version 0.17.0
--------------

* Added minimum duration for test samples


Version 0.16.4
--------------

* Added possibility to combine predictions per speaker (with mean or mode function)

Version 0.16.3
--------------

* Added minimal sample length for databases


Version 0.16.2
--------------

* Added k-fold-cross-validation for linear classifiers

Version 0.16.1
--------------

* Added leave-one-speaker-out for linear classifiers


Version 0.16.0
--------------

* Added random sample splits

