Metadata-Version: 2.1
Name: asreview-hyperopt
Version: 0.1.4
Summary: Hyper parameter optimization extension for ASReview
Home-page: https://github.com/asreview/asreview-hyperopt
Author: Utrecht University
Author-email: asreview@uu.nl
License: UNKNOWN
Project-URL: Bug Reports, https://github.com/asreview/asreview-hyperopt/issues
Project-URL: Source, https://github.com/asreview/asreview-hyperopt
Description: ## ASReview-hyperopt
        
        ![Deploy and release](https://github.com/msdslab/asreview-hyperopt/workflows/Deploy%20and%20release/badge.svg)
        
        Hyper parameter optimization extension for 
        [ASReview](https://github.com/asreview/asreview). It uses the 
        [hyperopt](https://github.com/hyperopt/hyperopt) package to quickly optimize parameters
        of the different models. The hyper parameters and their sample space are defined in the
        [ASReview](https://github.com/asreview/asreview) package, and 
        automatically used for hyper parameter optimization.
        
        ### Installation
        
        The easiest way to install the visualization package is to use the command line:
        
        ``` bash
        pip install asreview-hyperopt
        ```
        
        After installation of the visualization package, asreview should automatically detect it.
        Test this by:
        
        ```bash
        asreview --help
        ```
        
        It should list three new entry points: `hyper-active`, `hyper-passive` and `hyper-cluster`.
        
        ### Basic usage
        
        The three entry-points are used in a roughly similar fashion. The main difference between them is
        the types of models that have to be supplied:
        
        - hyper-cluster: feature_extraction
        - hyper-passive: model, balance\_strategy, feature\_extraction
        - hyper-active: model, balance\_strategy, query\_strategy, feature\_extraction
        
        
        To get help for entry points type:
        
        ```bash
        asreview hyper-active --help
        ```
        
        Which results in the following options:
        
        ```bash
        usage: /Users/qubix/Library/Python/3.6/bin/asreview [-h] [-m MODEL]
                                                            [-q QUERY_STRATEGY]
                                                            [-b BALANCE_STRATEGY]
                                                            [-e FEATURE_EXTRACTION]
                                                            [-n N_ITER] [-d DATASETS]
                                                            [--mpi]
        
        optional arguments:
          -h, --help            show this help message and exit
          -m MODEL, --model MODEL
                                Prediction model for active learning.
          -q QUERY_STRATEGY, --query_strategy QUERY_STRATEGY
                                Query strategy for active learning.
          -b BALANCE_STRATEGY, --balance_strategy BALANCE_STRATEGY
                                Balance strategy for active learning.
          -e FEATURE_EXTRACTION, --feature_extraction FEATURE_EXTRACTION
                                Feature extraction method.
          -n N_ITER, --n_iter N_ITER
                                Number of iterations of Bayesian Optimization.
          -d DATASETS, --datasets DATASETS
                                Datasets to use in the hyper parameter optimization
                                Separate by commas to use multiple at the same time
                                [default: all].
          --mpi                 Use the mpi implementation.
        
        ```
        
        ### Data structure
        
        The extension will search for datasets in the `data` directory, relative to the current
        working directory, so put your datasets there.
        
        The output of the runs will be stored in the `output` directory, again relative to the current path.
        
        An example of a structure that has been created:
        
        ```bash
        output/
        ├── active_learning
        │   ├── nb_max_double_tfidf
        │   │   └── depression_hall_ace_ptsd_nagtegaal
        │   │       ├── best
        │   │       │   ├── ace
        │   │       │   ├── depression
        │   │       │   ├── hall
        │   │       │   ├── nagtegaal
        │   │       │   └── ptsd
        │   │       ├── current
        │   │       │   ├── ace
        │   │       │   ├── depression
        │   │       │   ├── hall
        │   │       │   ├── nagtegaal
        │   │       │   └── ptsd
        │   │       └── trials.pkl
        │   └── nb_max_random_double_tfidf
        │       └── nagtegaal
        │           ├── best
        │           │   └── nagtegaal
        │           ├── current
        │           │   └── nagtegaal
        │           └── trials.pkl
        ├── cluster
        │   └── doc2vec
        │       ├── ace
        │       │   ├── best
        │       │   │   └── ace
        │       │   ├── current
        │       │   │   └── ace
        │       │   └── trials.pkl
        │       ├── depression_hall_ace_ptsd_nagtegaal
        │       │   └── current
        │       │       ├── ace
        │       │       ├── depression
        │       │       ├── hall
        │       │       ├── nagtegaal
        │       │       └── ptsd
        │       └── nagtegaal
        │           └── current
        │               └── nagtegaal
        └── passive
            └── nb_double_tfidf
                └── depression
                    ├── best
                    │   └── depression
                    ├── current
                    │   └── depression
                    └── trials.pkl
        ```
        
        The files with name `trials.pkl` are special files that contain data on which trials were run.
        
        To list these trials, use the following command:
        
        ```bash
        asreview show $SOME_DIRECTORY/trials.pkl
        ```
        
        It should give a list of trials sorted by the loss (lower is better). The column names (apart
        from the loss) are prefixed with the kind of parameter it is:
        
        - `mdl`: Model parameter
        - `bal`: Balance strategy parameter
        - `qry`: Query strategy parameter
        - `fex`: Feature extraction parameter
        
        ### Options
        
        The default number of iterations is 1, which you'll probably want to increase. It depends on the
        number of hyper-parameters that need to be optimized, but several hundred iterations is probably
        a good estimate for most combinations to get reasonably close to the optimum. In all cases,
        use good common sense; if the loss is still going down at a quick pace, do a few more iterations.
        
        The hyperopt extension has built-in support for MPI. MPI is used for parallelization of runs. On
        a local PC with an MPI-implementation (like OpenMPI) installed, one could run with 4 cores:
        
        ```bash
        mpirun -n 4 asreview hyper-active
        ```
        
        On super computers one should sometimes replace `mpirun` with `srun`.
        
        
        ### Time measurements:
        
        #### inactive
        
        nb, tfidf, double, max -> 53 seconds
        svm, tfidf, double, max -> 1940 seconds
        rf, tfidf, double, max -> 80 seconds
        logistic, tfidf, double, max -> 250 seconds /4
        dense_nn, tfidf, double, max -> ?
        dense_nn, doc2vec, double, max ->  2750 seconds /1, /2
        svm, doc2vec, ...
        
Keywords: asreview plot hyperopt optimization
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Description-Content-Type: text/markdown
