Metadata-Version: 2.4
Name: pyrxiv
Version: 0.4.2
Summary: A Python package for retrieving arXiv papers and filter them out with respect to some specific regex pattern.
Author-email: "Jose M. Pizarro" <jose.pizarro-blanco@bam.de>
Maintainer-email: "Jose M. Pizarro" <jose.pizarro-blanco@bam.de>
License: MIT License
        
        Copyright (c) 2025 Jose M. Pizarro
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/JosePizarro3/pyrxiv
Project-URL: Bug Tracker, https://github.com/JosePizarro3/pyrxiv/issues
Classifier: Natural Language :: English
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.2.1
Requires-Dist: pydantic~=2.10.5
Requires-Dist: structlog~=24.4.0
Requires-Dist: requests~=2.32.4
Requires-Dist: xmltodict~=0.14.2
Requires-Dist: pypdf~=5.7.0
Requires-Dist: pdfminer.six
Requires-Dist: langchain-community~=0.3.27
Requires-Dist: h5py
Provides-Extra: dev
Requires-Dist: mypy==1.0.1; extra == "dev"
Requires-Dist: ruff>=0.11.4; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-timeout; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Dynamic: license-file

<h4 align="center">

![CI](https://github.com/JosePizarro3/pyrxiv/actions/workflows/actions.yml/badge.svg)
![Coverage](https://coveralls.io/repos/github/JosePizarro3/pyrxiv/badge.svg?branch=main)
![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)
![PyPI version](https://img.shields.io/pypi/v/pyrxiv.svg)
![Python versions](https://img.shields.io/pypi/pyversions/pyrxiv.svg)

</h4>

# pyrxiv

**pyrxiv** is a Python package for retrieving [arXiv](https://arxiv.org) papers, storing their metadata in [pydantic](https://docs.pydantic.dev/latest/)-like classes, and optionally filtering some of them out based on the specific content of the papers (matching a regex pattern).

While originally developed for the **Strongly Correlated Electron Systems** community in Condensed Matter Physics ([`cond-mat.str-el`](https://arxiv.org/list/cond-mat.str-el/recent)), it's designed to be flexible and applicable to **any arXiv category**.

Install the core package:
```bash
pip install pyrxiv
```

## Objective
**pyrxiv** main objective is to provide an easy command line interface (CLI) to search and download arXiv papers which contain a specific content string matched against a regex pattern. You can use the CLI and print the options after installing the package using:
```bash
pyrxiv --help
```

or directly:
```bash
pyrxiv search_and_download --help
```

For example:
```bash
pyrxiv search_and_download --category cond-mat.str-el --regex-pattern "DMFT|Hubbard" --n-papers 5 --download-pdfs True
```

---

# Development

To contribute to `pyrxiv` or run it locally, follow these steps:


## Clone the Repository

```bash
git clone https://github.com/JosePizarro3/pyrxiv.git
cd pyrxiv
```

## Set Up a Virtual Environment

We recommend Python ≥ 3.10:
```bash
python3 -m venv .venv
source .venv/bin/activate
```

## Install Dependencies

Use [`uv`](https://docs.astral.sh/uv/) (faster than pip) to install the package in editable mode with `dev` extras:
```bash
pip install --upgrade pip
pip install uv
uv pip install -e .[dev]
```

## Run tests

Use `pytest` with verbosity to run all tests:
```bash
python -m pytest -sv tests
```


To check code coverage:
```bash
python -m pytest --cov=pyrxiv tests
```

### Code formatting and linting


We use [`Ruff`](https://docs.astral.sh/ruff/) for formatting and linting (configured via `pyproject.toml`).

Check linting issues:
```bash
ruff check .
```

Auto-format code:
```bash
ruff format .
```

Manually fix anything Ruff cannot handle automatically.
