Metadata-Version: 2.1
Name: th-simple-preprocessor
Version: 0.9.0
Summary: Simple Thai Preprocess Functions
Home-page: https://github.com/wisesight/th-simple-preprocessor
Author: WISESIGHT Product Development
Author-email: tequila@wisesight.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: Thai
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: General
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE

# __th-preprocessor__

Simple Thai Preprocess Functions

## __Objectives__
This repository provides simple preprocess techniques for Thai sentences/phrases

## __Supports__
The module supports Python 3.6+

## __Installation__
```
pip install th-simple-preprocessor
```

## __How to Use__
```python
from th_preprocessor.preprocess import preprocess

text = '"::::: อย่างไรก็ตามนูร์ ฮิชัม อับดุลเลาะห์ 21-09-2018 https://www.malaysiakini.com/news/444015"'
words = preprocess(text)

print(words) 
# อย่างไรก็ตามนูร์ ฮิชัม อับดุลเลาะห์ WSNUMBER WSNUMBER WSNUMBER WSLINK
```

## Package reference:
- [`th_preprocessor.preprocess.normalize_link`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L149)
- [`th_preprocessor.preprocess.normalize_at_mention`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L155)
- [`th_preprocessor.preprocess.normalize_email`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L160)
- [`th_preprocessor.preprocess.normalize_haha`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L165)
- [`th_preprocessor.preprocess.normalize_num`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L170)
- [`th_preprocessor.preprocess.normalize_phone`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L175)
- [`th_preprocessor.preprocess.normalize_accented_chars`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L180)
- [`th_preprocessor.preprocess.normalize_special_chars`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L184)
- [`th_preprocessor.preprocess.remove_hashtags`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L192)
- [`th_preprocessor.preprocess.remove_tag`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L196)
- [`th_preprocessor.preprocess.remove_dup_spaces`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L207)
- [`th_preprocessor.preprocess.remove_emoji`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L246)
- [`th_preprocessor.preprocess.replace_dup_chars`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L215)
- [`th_preprocessor.preprocess.replace_dup_emojis`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L225)
- [`th_preprocessor.preprocess.insert_spaces`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L235)
- [`th_preprocessor.preprocess.normalize_emoji`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L250)
- [`th_preprocessor.preprocess.remove_others_char`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L254)
- [`th_preprocessor.preprocess.remove_stopwords`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L287)
- [`th_preprocessor.preprocess.preprocess`](https://github.com/wisesight/th-simple-preprocessor/blob/main/th_preprocessor/preprocess.py#L261)
## __Copyright__
All licenses in this repository are copyrighted by their respective authors. Everything else is released under CC0. See [LICENSE](https://github.com/wisesight/th-simple-preprocessor/blob/main/LICENSE) for details.


