Metadata-Version: 2.1
Name: scraper-bot
Version: 1.3.1
Summary: A telegram bot to stay tuned on real estate ads
Home-page: https://github.com/RobertoBochet/bot-scraper.git
License: GPL-3.0-or-later
Author: Roberto Bochet
Author-email: r@robertobochet.me
Requires-Python: >=3.12,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Framework :: AsyncIO
Classifier: Framework :: Pydantic
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Typing :: Typed
Requires-Dist: aiolimiter (>=1.1.0,<2.0.0)
Requires-Dist: apprise (>=1.8.0,<2.0.0)
Requires-Dist: black (>=24.4.2,<25.0.0)
Requires-Dist: diskcache (>=5.6.3,<6.0.0)
Requires-Dist: jinja2 (>=3.1.4,<4.0.0)
Requires-Dist: playwright (>=1.44.0,<2.0.0)
Requires-Dist: playwright-stealth (>=1.0.6,<2.0.0)
Requires-Dist: pydantic (>=2.7.4,<3.0.0)
Requires-Dist: pydantic-settings (>=2.3.4,<3.0.0)
Requires-Dist: redis (>=5.0.7,<6.0.0)
Requires-Dist: setuptools (>=70.1.1,<71.0.0)
Requires-Dist: termcolor (>=2.4.0,<3.0.0)
Project-URL: Repository, https://github.com/RobertoBochet/bot-scraper.git
Description-Content-Type: text/markdown

# Scraper Bot

[![GitHub](https://img.shields.io/github/license/RobertoBochet/scraper-bot?style=flat-square)](https://github.com/RobertoBochet/scraper-bot)
[![GitHub Version](https://img.shields.io/github/v/tag/RobertoBochet/scraper-bot?label=version&style=flat-square)](https://github.com/RobertoBochet/scraper-bot)
[![PyPI - Version](https://img.shields.io/pypi/v/scraper-bot?style=flat-square)](https://pypi.org/project/scraper-bot/)
[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/RobertoBochet/scraper-bot/test-code.yml?label=test%20code&style=flat-square)](https://github.com/RobertoBochet/scraper-bot)
[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/RobertoBochet/scraper-bot/release.yml?label=publish%20release&style=flat-square)](https://github.com/RobertoBochet/scraper-bot/pkgs/container/scraper-bot)
[![CodeFactor Grade](https://img.shields.io/codefactor/grade/github/RobertoBochet/scraper-bot?style=flat-square)](https://www.codefactor.io/repository/github/robertobochet/scraper-bot)

This is a bot thought to do periodical scraping of ads from commercial websites.

Found a new ad the bot will send it to you exploiting [Apprise](https://github.com/caronc/apprise) channels

## Deploy

### Pypi

The relative package is available on [Pypi](https://pypi.org/project/scraper-bot/)

```shell
pip install scraper-bot
```
The package heavily relays on [`playwright`](https://playwright.dev/python/) package, so before start to use the bot you have to install a playwright browser
```shell
playwright install --with-deps firefox
```
You can found further information in the [`playwright` documentation](https://playwright.dev/python/docs/browsers)
_(n.b. the bot are not limited to use firefox only)_

The `scraper-bot` package provide the following command to run the bot
```shell
scraper-bot
```

### Container

The CI builds the container for each version and it puts it on the public [GitHub registry](https://ghcr.io/robertobochet/scraper-bot)
```
ghcr.io/robertobochet/scraper-bot
```

#### docker compose

1. [Create a telegram bot](https://core.telegram.org/bots#3-how-do-i-create-a-bot) and retrieve its token
2. Download `config.example.yaml` and rename it to `config.yaml`
3. Change the configuration follow the [guidelines](#configuration)
4. Download `docker-compose.yaml`
5. Start the scraper with `docker-compose`
    ```shell
    docker-compose up
    ```
6. Wait that the bot does its work!

### Kubernetes (Helm chart)

For the deploy of the **Scraper Bot** is also available a [helm chart](https://helm.sh/)

You can found the source code in the repo [`scraper-bot-chart`](https://github.com/RobertoBochet/scraper-bot-chart)

Helm chart package is available in the github OCI registry
```
oci://ghcr.io/robertobochet/scraper-bot-chart
```
You can use it to directly deploy on your kubernetes cluster
1. Retrieve the default values file
   ```shell
   helm show values oci://ghcr.io/robertobochet/scraper-bot-chart > values.yaml
   ```
2. Customize the `values.yaml`
3. Install the scaper bot
   ```shell
   helm install oci://ghcr.io/robertobochet/scraper-bot-chart scraper-bot -f values.yaml
   ```

## Configuration

By default the bot looks for a configuration file in the following path `./config.y(a)ml` and `/etc/scaraper-bot/config.y(a)ml`. You cna override this behavior passing via command line the `--config` argument followed by the config file path
```shell
scraper-bot --config /path/to/scraper-bot-config.yaml
```

The configuration file has to satisfy the pydantic model which you can find in `scraper_bot.settings`.
Furthermore you can get the config json schema from command line with `--config-schema` argument
```shell
scraper-bot --config-schema
```

You can also find a configuration example in `config.example.yaml`.

