Metadata-Version: 2.1
Name: bgnlp
Version: 0.0.10
Summary: Package for Bulgarian Natural Language Processing (NLP)
Home-page: UNKNOWN
Author: Adam Fauzi
Author-email: adamfzh98@gmail.com
License: UNKNOWN
Keywords: pytorch,nlp,bulgaria,machine learning,deep learning,AI
Platform: UNKNOWN
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Description-Content-Type: text/markdown
License-File: LICENSE

# **bgnlp**: Model-first approach to Bulgarian NLP

```sh
pip install bgnlp
```

## Package functionalities

### Part-of-speech tagging

```python
from bgnlp import PosTagger, PosTaggerConfig

config = PosTaggerConfig()
pos = PosTagger(config=config)
print(pos("РўРѕРІР° Рµ Р±РёР±Р»РёРѕС‚РµРєР° Р·Р° РѕР±СЂР°Р±РѕС‚РєР° РЅР° РµСЃС‚РµСЃС‚РІРµРЅ РµР·РёРє."))
```

```json
[{
    "word": "РўРѕРІР°",
    "tag": "PDOsn",
    "bg_desc": "РјРµСЃС‚РѕРёРјРµРЅРёРµ",
    "en_desc": "pronoun"
}, {
    "word": "Рµ",
    "tag": "VLINr3s",
    "bg_desc": "РіР»Р°РіРѕР»",
    "en_desc": "verb"
}, {
    "word": "Р±РёР±Р»РёРѕС‚РµРєР°",
    "tag": "NCFsof",
    "bg_desc": "СЃСЉС‰РµСЃС‚РІРёС‚РµР»РЅРѕ РёРјРµ",
    "en_desc": "noun"
}, {
    "word": "Р·Р°",
    "tag": "R",
    "bg_desc": "РїСЂРµРґР»РѕРі",
    "en_desc": "preposition"
}, {
    "word": "РѕР±СЂР°Р±РѕС‚РєР°",
    "tag": "NCFsof",
    "bg_desc": "СЃСЉС‰РµСЃС‚РІРёС‚РµР»РЅРѕ РёРјРµ",
    "en_desc": "noun"
}, {
    "word": "РЅР°",
    "tag": "R",
    "bg_desc": "РїСЂРµРґР»РѕРі",
    "en_desc": "preposition"
}, {
    "word": "РµСЃС‚РµСЃС‚РІРµРЅ",
    "tag": "Asmo",
    "bg_desc": "РїСЂРёР»Р°РіР°С‚РµР»РЅРѕ РёРјРµ",
    "en_desc": "adjective"
}, {
    "word": "РµР·РёРє",
    "tag": "NCMsom",
    "bg_desc": "СЃСЉС‰РµСЃС‚РІРёС‚РµР»РЅРѕ РёРјРµ",
    "en_desc": "noun"
}, {
    "word": ".",
    "tag": "U",
    "bg_desc": "РїСЂРµРїРёРЅР°С‚РµР»РµРЅ Р·РЅР°Рє",
    "en_desc": "punctuation"
}]
```

### Lemmatization

```python
from bgnlp import LemmaTaggerConfig, LemmaTagger

lemma = LemmaTagger(config=LemmaTaggerConfig())
text = "Р”РѕР±СЂРµ РґРѕС€Р»Рё!"
print(lemma(text))
```

```bash
[{'word': 'Р”РѕР±СЂРµ', 'lemma': 'Р”РѕР±СЂРµ'}, {'word': 'РґРѕС€Р»Рё', 'lemma': 'РґРѕР№РґР°'}, {'word': '!', 'lemma': '!'}]
```


