Metadata-Version: 2.4
Name: zimscraperlib
Version: 5.1.1
Summary: Collection of python tools to re-use common code across scrapers
Project-URL: Donate, https://www.kiwix.org/en/support-us/
Project-URL: Homepage, https://github.com/openzim/python-scraperlib
Author-email: openZIM <dev@openzim.org>
License: GPL-3.0-or-later
License-File: LICENSE
Keywords: offline,openzim,zim
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Requires-Python: <3.14,>=3.13
Requires-Dist: babel<3.0,>=2.9
Requires-Dist: beartype==0.19.0
Requires-Dist: beautifulsoup4<5.0,>=4.9.3
Requires-Dist: cairosvg<3.0,>=2.2.0
Requires-Dist: colorthief==0.2.1
Requires-Dist: idna<4.0,>=2.5
Requires-Dist: iso639-lang<3.0,>=2.4.0
Requires-Dist: libzim<4.0,>=3.4.0
Requires-Dist: lxml<6.0,>=4.6.3
Requires-Dist: optimize-images<2.0,>=1.3.6
Requires-Dist: piexif==1.1.3
Requires-Dist: pillow<12.0,>=7.0.0
Requires-Dist: pymupdf<2.0,>=1.24.0
Requires-Dist: python-magic<0.5,>=0.4.3
Requires-Dist: python-resize-image<1.2,>=1.1.19
Requires-Dist: regex>=2020.7.14
Requires-Dist: requests<3.0,>=2.25.1
Requires-Dist: urllib3<2.4.0,>=1.26.5
Requires-Dist: yt-dlp
Provides-Extra: check
Requires-Dist: pyright==1.1.394; extra == 'check'
Requires-Dist: pytest==8.3.4; extra == 'check'
Provides-Extra: dev
Requires-Dist: black==25.1.0; extra == 'dev'
Requires-Dist: coverage==7.6.12; extra == 'dev'
Requires-Dist: invoke==2.2.0; extra == 'dev'
Requires-Dist: ipython==8.32.0; extra == 'dev'
Requires-Dist: jinja2==3.1.5; extra == 'dev'
Requires-Dist: mkdocs-gen-files==0.5.0; extra == 'dev'
Requires-Dist: mkdocs-include-markdown-plugin==7.1.4; extra == 'dev'
Requires-Dist: mkdocs-literate-nav==0.6.1; extra == 'dev'
Requires-Dist: mkdocs-material==9.6.4; extra == 'dev'
Requires-Dist: mkdocs==1.6.1; extra == 'dev'
Requires-Dist: mkdocstrings[python]==0.28.1; extra == 'dev'
Requires-Dist: pre-commit==4.1.0; extra == 'dev'
Requires-Dist: pymdown-extensions==10.14.3; extra == 'dev'
Requires-Dist: pyright==1.1.394; extra == 'dev'
Requires-Dist: pytest-mock==3.14.0; extra == 'dev'
Requires-Dist: pytest==8.3.4; extra == 'dev'
Requires-Dist: pyyaml==6.0.2; extra == 'dev'
Requires-Dist: ruff==0.9.6; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-gen-files==0.5.0; extra == 'docs'
Requires-Dist: mkdocs-include-markdown-plugin==7.1.4; extra == 'docs'
Requires-Dist: mkdocs-literate-nav==0.6.1; extra == 'docs'
Requires-Dist: mkdocs-material==9.6.4; extra == 'docs'
Requires-Dist: mkdocs==1.6.1; extra == 'docs'
Requires-Dist: mkdocstrings[python]==0.28.1; extra == 'docs'
Requires-Dist: pymdown-extensions==10.14.3; extra == 'docs'
Provides-Extra: lint
Requires-Dist: black==25.1.0; extra == 'lint'
Requires-Dist: ruff==0.9.6; extra == 'lint'
Provides-Extra: scripts
Requires-Dist: invoke==2.2.0; extra == 'scripts'
Requires-Dist: jinja2==3.1.5; extra == 'scripts'
Requires-Dist: pyyaml==6.0.2; extra == 'scripts'
Provides-Extra: test
Requires-Dist: coverage==7.6.12; extra == 'test'
Requires-Dist: pytest-mock==3.14.0; extra == 'test'
Requires-Dist: pytest==8.3.4; extra == 'test'
Description-Content-Type: text/markdown

# zimscraperlib

[![Build Status](https://github.com/openzim/python-scraperlib/workflows/CI/badge.svg?query=branch%3Amain)](https://github.com/openzim/python-scraperlib/actions?query=branch%3Amain)
[![CodeFactor](https://www.codefactor.io/repository/github/openzim/python-scraperlib/badge)](https://www.codefactor.io/repository/github/openzim/python-scraperlib)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![PyPI version shields.io](https://img.shields.io/pypi/v/zimscraperlib.svg)](https://pypi.org/project/zimscraperlib/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/zimscraperlib.svg)](https://pypi.org/project/zimscraperlib)
[![codecov](https://codecov.io/gh/openzim/python-scraperlib/branch/master/graph/badge.svg)](https://codecov.io/gh/openzim/python-scraperlib)
[![Read the Docs](https://img.shields.io/readthedocs/python-scraperlib)](https://python-scraperlib.readthedocs.io/)

Collection of python code to re-use across python-based scrapers

# Usage

- This library is meant to be installed via PyPI ([`zimscraperlib`](https://pypi.org/project/zimscraperlib/)).
- Make sure to reference it using a version code as the API is subject to frequent changes.
- API should remain the same only within the same _minor_ version.

Example usage:

```pip
zimscraperlib>=1.1,<1.2
```

See documentation at [Read the Docs](https://python-scraperlib.readthedocs.io/) for details.

# Dependencies

- libmagic
- wget
- libzim (auto-installed, not available on Windows)
- Pillow
- FFmpeg
- gifsicle (>=1.92)
- libcairo (if you use the image manipulation, this is used for svg conversion)

## macOS

```sh
brew install libmagic wget libtiff libjpeg webp little-cms2 ffmpeg gifsicle
```

## Linux

```sh
sudo apt install libmagic1 wget ffmpeg \
    libtiff5-dev libjpeg8-dev libopenjp2-7-dev zlib1g-dev \
    libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python3-tk \
    libharfbuzz-dev libfribidi-dev libxcb1-dev gifsicle
```

## Alpine

```
apk add ffmpeg gifsicle libmagic wget libjpeg
```

# Contribution

This project adheres to openZIM's [Contribution Guidelines](https://github.com/openzim/overview/wiki/Contributing).

This project has implemented openZIM's [Python bootstrap, conventions and policies](https://github.com/openzim/_python-bootstrap/docs/Policy.md) **v1.0.2**.

```shell
pip install hatch
pip install ".[dev]"
pre-commit install
# For tests
invoke coverage
```

# Users

Non-exhaustive list of scrapers using it (check status when updating API):

- [openzim/freecodecamp](https://github.com/openzim/freecodecamp)
- [openzim/gutenberg](https://github.com/openzim/gutenberg)
- [openzim/ifixit](https://github.com/openzim/ifixit)
- [openzim/kolibri](https://github.com/openzim/kolibri)
- [openzim/nautilus](https://github.com/openzim/nautilus)
- [openzim/nautilus](https://github.com/openzim/nautilus)
- [openzim/openedx](https://github.com/openzim/openedx)
- [openzim/sotoki](https://github.com/openzim/sotoki)
- [openzim/ted](https://github.com/openzim/ted)
- [openzim/warc2zim](https://github.com/openzim/warc2zim)
- [openzim/wikihow](https://github.com/openzim/wikihow)
- [openzim/youtube](https://github.com/openzim/youtube)
