Metadata-Version: 2.1
Name: lang2vec
Version: 0.1.3
Summary: Returns language vectors
Home-page: https://github.com/pypa/sampleproject
Author: Antonis Anastasopoulos, Patrick Littell, David Mortensen
Author-email: aanastas@cs.cmu.com
License: UNKNOWN
Description: lang2vec
        Author: Patrick Littell
        Last updated: July 15, 2016
        
        Usage: ./lang2vec (-m) (-f) (-r) <LANGUAGES>
        
        <LANGUAGES> is a space-separated string of ISO 639-3 codes (e.g., "deu eng fra").  Any two letter codes ISO 639-1 codes will be mapped to their corresponding ISO-639-3 codes.
        
        <SETS> is a named feature set (e.g., syntax_wals or phonology_knn), or an elementwise union A|B of two feature sets, or a concatenation A+B of two feature sets.  So "id+syntax_wals|syntax_sswl" gives the id vector concatenated with the elementwise union of the WALS and SSWL syntax feature sets.
        
        The named sets are:
        
            Sets from feature and inventory databases:
                "syntax_wals",
                "phonology_wals",
                "syntax_sswl",
                "syntax_ethnologue",
                "phonology_ethnologue",
                "inventory_ethnologue",
                "inventory_phoible_aa",
                "inventory_phoible_gm",
                "inventory_phoible_saphon",
                "inventory_phoible_spa",
                "inventory_phoible_ph",
                "inventory_phoible_ra",
                "inventory_phoible_upsid",
        
            Averages of sets:
                "syntax_average",
                "phonology_average",
                "inventory_average",
        
            KNN predictions of feature values:
                "syntax_knn",
                "phonology_knn",
                "inventory_knn",
        
            Membership in language families and subfamilies:
                "fam",
        
            Distance from fixed points on Earth's surface
                "geo",
                
            One-hot identity vector:
                "id",
            
            
        OPTIONS:
        
        -m, --minimal: Suppresses columns that contain only zeros, only ones, or only nulls
        -f, --fields: Display field names as the first row.
        -r, --random: Randomize the values (as, for example, a control)
        
        The "minimal" transformation applies after any union or concatenation.  (If it did not, sets in the same group, like the syntax_* sets, would not be the same dimensionality for comparison.) The "random" transformation applies after the "minimal" transformation.  (So if you're doing an experiment with a minimized set and using a randomized set as a control, the randomized set will be the same dimensionality as the original.)
        
        REFERENCES:
        
        The different sets above are derived from many sources:
        
        *_wals -- Features derived from the World Atlas of Language Structures.
        *_sswl -- Features derived from Syntactic Structures of the World's Languages.
        *_ethnologue -- Features derived from (shallowly) parsing the prose typological descriptions in Ethnologue (Lewis et al. 2015).
        *_phoible_aa -- AA = Alphabets of Africa. Features derived from PHOIBLE's normalization of *Systèmes alphabétiques des langues africaines* (Hartell 1993, Chanard 2006).
        *_phoible_gm -- GM = Green and Moran.  Features derived from PHOIBLE's normalization of Christopher Green and Steven Moran's pan-African inventory database.
        *_phoible-ph -- PH = PHOIBLE.  Features derived from PHOIBLE proper, by Moran, McCloy, and Wright (2012).
        *_phoible-ra -- RA = Ramaswami.  Features derived from PHOIBLE's normalization of *Common Linguistic Features in Indian Languages: Phoentics* (Ramaswami 1999).
        *_phoible-saphon - SAPHON = South American Phonological Inventory Database.  Features derived from PHOIBLE's normalization of SAPHON (Lev et al. 2012).
        *_phoible-spa - SPA = Stanford Phonology Archive.  Features derived from PHOIBLE's normalization of SPA (Crothers et al., 1979).
        *_phoible-upsid - UPSID = UCLA Phonological Segment Inventory Database.  Features derived from PHOIBLE's normalization of UPSID (Maddieson 1984, Maddieson and Precoda 1990).
        
        
Platform: UNKNOWN
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Description-Content-Type: text/markdown
