Metadata-Version: 2.4
Name: khmereasytools
Version: 0.3.7
Summary: A simple, self-contained library for Khmer text processing, with optional OCR and POS tagging support.
Home-page: https://github.com/back-kh/khmereasytools
Author: Nimol Thuon
Author-email: nimol.thuon@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: ocr
Requires-Dist: pytesseract>=0.3.8; extra == "ocr"
Requires-Dist: Pillow>=9.0.0; extra == "ocr"
Provides-Extra: khmernltk
Requires-Dist: khmernltk>=1.6; extra == "khmernltk"
Provides-Extra: all
Requires-Dist: pytesseract>=0.3.8; extra == "all"
Requires-Dist: Pillow>=9.0.0; extra == "all"
Requires-Dist: khmernltk>=1.6; extra == "all"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-python
Dynamic: summary


# Khmer Easy Tools

A simple, user-friendly, and self-contained Python library for common Khmer Natural Language Processing (NLP) tasks. This package provides easy-to-use functions for keyword extraction and segmentation without requiring complex external dependencies for its core features.

## Installation

Install the base package:
```bash
pip install khmereasytools
```

### Installing Optional Features

You can install the features you need.

```bash
# To install support for POS tagging (khpos)
pip install khmereasytools[khmernltk]

# To install support for OCR (khocr)
pip install khmereasytools[ocr]

# To install all optional features
pip install khmereasytools[all]
```

**For OCR functionality**, you must also install Google's Tesseract OCR engine on your system.
-   [Tesseract Installation Guide](https://github.com/tesseract-ocr/tesseract/wiki)
-   Make sure to install the Khmer (`khm`) language data.

## How to Use

### Keyword Extraction (`khfilter`)
Uses a built-in segmentation algorithm to find words and remove stop words.
```python
import khmereasytools as ket
text = "នេះគឺជាប្រាសាទអង្គរវត្តស្ថិតនៅក្នុងខេត្តសៀមរាប"
keywords = ket.khfilter(text)
print(f"Keywords: '{keywords}'")
```

### Text Segmentation (`khseg`)
Uses a built-in segmentation algorithm to split text into words.
```python
import khmereasytools as ket
text = "នេះគឺជាប្រាសាទអង្គរវត្ត"
words = ket.khseg(text)
print(f"Segmented Words: {words}")
```

### Part-of-Speech Tagging (`khpos`)
*Requires `khmernltk` to be installed.*
```python
import khmereasytools as ket
# pip install khmereasytools[khmernltk]
text = "ខ្ញុំស្រឡាញ់ភាសាខ្មែរ"
tags = ket.khpos(text)
print(f"POS Tags: {tags}")
```

### OCR from Image (`khocr`)
*Requires `ocr` dependencies to be installed.*
```python
import khmereasytools as ket
# pip install khmereasytools[ocr]
# text_from_image = ket.khocr('khmer_text.png')
# print(f"Text from OCR: {text_from_image}")
```
