Metadata-Version: 1.0
Name: recluse
Version: 0.2.1
Summary: Reproducible Experimentation for Computational Linguistics Use
Home-page: https://github.com/lamber/recluse
Author: L. Amber Wilcox-O'Hearn
Author-email: amber@cs.toronto.edu
License: COPYING
Description: Recluse
        
        Author: L. Amber Wilcox-O'Hearn
        
        Contact: amber@cs.toronto.edu
        
        Released under the GNU AFFERO GENERAL PUBLIC LICENSE, see COPYING file for details.
        
        ==============
        Introduction
        ==============
        
        Recluse (Reproducible Experimentation for Computational Linguistics Use) is a set of tools for running computational linguistics experiments reproducibly.
        
        This version contains 
        
        * utils, which has three functions:
        ** open_with_unicode for reading and writing unicode with regular or compressed text
        ** split_file_into_chunks for splitting a file into smaller pieces.  This is needed for some tools that load everything into RAM, or train on all the data when we would be satisfied with training on partial data.
        ** partition_by_list works like a combination of the string methods partition and split; it keeps the separators, but partitions into a list.
        
        * article_randomiser, which reproducibly randomly divides a corpus into training, development, and test sets.
        * nltk_based_segmenter_tokeniser, which does sentence segmentation and word tokenisation.
          It is optimised for Wikipedia type text.
        * vocabulary_generator and the helper class vocabulary_cutter.  This wraps srilm as it makes unigram counts, and then selects the most frequent.
        
        
        
        
        
Platform: UNKNOWN
