Metadata-Version: 2.4
Name: PgsFile
Version: 0.6.2
Summary: This module streamlines Python package management, script execution, file handling, web scraping, and multimedia downloads. It supports LLM-based NLP tasks like OCR, tokenization, lemmatization, POS tagging, NER, ATE, dependency parsing, MDD, WSD, LIWC, MIP analysis, text classification, and Chinese-English sentence alignment. Additionally, it generates word lists and data visualizations, making it a practical tool for data scraping and analysis—ideal for literary students and researchers.
Home-page: https://github.com/Petercusin/PgsFile
Author: Dr. Guisheng Pan is an instructor at the School of Foreign Studies, Shanghai University of Finance and Economics (SUFE). The school’s website is available at: https://sfs.sufe.edu.cn/bf/ef/c4221a245743/page.htm
Author-email: panguisheng@sufe.edu.cn
License: Educational free
Classifier: Programming Language :: Python :: 3
Classifier: License :: Free For Educational Use
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: chardet
Requires-Dist: pandas
Requires-Dist: python-docx
Requires-Dist: pip
Requires-Dist: requests
Requires-Dist: fake-useragent
Requires-Dist: lxml
Requires-Dist: pimht
Requires-Dist: pysbd
Requires-Dist: nlpir-python
Requires-Dist: pillow
Requires-Dist: liwc
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

Purpose: This module is designed to make complex tasks accessible and convenient, even for beginners. By providing a unified set of tools, it simplifies the workflow for data collection, processing, and analysis. Whether you're scraping data from the web, cleaning text, or performing LLM-based NLP tasks, this module ensures you can focus on your research without getting bogged down by technical challenges.

Key Features:
1. **Web Scraping:** Easily scrape data from websites and download multimedia content.
2. **Package Management:** Install, uninstall, and manage Python packages with simple commands.
3. **Data Retrieval:** Extract data from various file formats like text, JSON, CSV, TSV, XLSX, XML, TMX, and HTML (both online and offline).
4. **Data Storage:** Write and append data to text files, Excel, JSON, TMX, and JSON lines.
5. **File and Folder Processing:** Manage file paths, create directories, move or copy files, convert CSV to JSON, and search for files with specific keywords.
6. **Data Cleaning:** Clean text, handle punctuation, remove stopwords, convert Markdown strings into Python objects, and prepare data for analysis, utilizing valuable corpora and dictionaries such as CET-4/6 vocabulary, BE21 and BNC-COCA word lists.
7. **NLP:** Perform OCR, word tokenization, lemmatization, POS tagging, NER, dependency parsing, ATE, MDD, WSD, LIWC, MIP analysis, text classification, and Chinese-English sentence alignment using prepared LLM prompts.
8. **Math Operations:** Format numbers, convert decimals to percentages, and validate data.
9. **Visualization:** Process images (e.g., make white pixels transparent, resize images) and manage fonts for rendering text.

Author: Dr. Guisheng Pan (潘贵生) is an instructor at the School of Foreign Studies, Shanghai University of Finance and Economics (SUFE).
Email: panguisheng@sufe.edu.cn
Homepage: https://sfs.sufe.edu.cn/bf/ef/c4221a245743/page.htm
