﻿
# PQuery: Fluent, SQL-like File Querying for Python

PQuery provides a powerful, chainable API for filtering, transforming, and processing files using expressive, SQL-inspired syntax. It depends on [Frist](https://github.com/hucker/frist) for date calculations and [TPath](https://github.com/hucker/tpath) for property-based path objects and calendar-based filtering.


## Quick Examples

PQuery's `map_parallel` method lets you process files in parallel with minimal effort. This is especially valuable for file operations, where IO and metadata access can be slow. With just one method call, you get free parallelism—no thread management required.

```python
from pquery import pquery

def check_log_file(log_file:Path):
    # check for errors
    result =  some_expensive_json_check(log_file)
    return result


# Map file names in parallel (4 workers)
for res in pquery(from_="/var/log").where(lambda p: matches(p, "*.log")).map_parallel(check_log_file, workers=4):
    if res.success:
        print("OK:", res.path, res.data, f"{res.execution_time:.3f}s")
    else:
        print("ERR:", res.path, res.exception)
```

Benefits:

- No manual thread or process management
- Scales with IO-bound workloads (reading, stat, metadata)
- Scales with CPU-bound workloads on python3.14t (freethreading)
- Handles errors and exceptions per file
- Works with all PQuery filters and chaining

```python
from tpath import TPath
from pquery import PQuery, matches

# Find all large log files modified in the last month
files = (
    PQuery()
    .from_("/var/log")
    .where(lambda p: matches(p, "*.log") and p.size.mb > 10 and p.mtime.cal.in_months(-1, 0))
    .select(lambda p: (p.name, p.size.mb, p.age.days))
)
for name, size, age in files:
    print(f"{name}: {size:.2f} MB, {age:.2f} days old")
```

Sample output:

```text
app.log: 12.30 MB, 5.00 days old
error.log: 15.80 MB, 29.12 days old
archive.log: 22.10 MB, 31.75 days old
```

## Expensive Parallel Mapping Example

`map_parallel` is an **expensive operation**: it launches threads and coordinates results, which is ideal for IO-bound workloads (like file reading or stat). For **CPU-bound operations** (such as examining JSON files for errors), Python 3.14's free-threading can significantly improve performance by allowing true parallel execution, bypassing the GIL.


## Performance Note: Parallel Mapping and Python 3.14

`map_parallel` is an **expensive operation**: it launches threads and coordinates results, which is ideal for IO-bound workloads (like file reading or stat). For **CPU-bound operations** (such as converting image files to thumbnails), Python 3.14's free-threading can significantly improve performance by allowing true parallel execution, bypassing the GIL.

### Example: Parallel image thumbnail conversion (CPU-bound)

```python
from pquery import PQuery
from PIL import Image

def make_thumbnail(path):
    img = Image.open(path)
    img.thumbnail((128, 128))
    thumb_path = path.with_name(path.stem + "_thumb.jpg")
    img.save(thumb_path)
    return thumb_path

# With Python 3.14+, map_parallel can utilize free-threading for CPU-bound tasks
for res in PQuery.from_("./images").where(lambda p: matches(p, "*.jpg")).map_parallel(make_thumbnail, workers=8):
    if res.success:
        print("Thumbnail created:", res.data)
    else:
        print("Error:", res.path, res.exception)
```

**Summary:**

- For IO-bound tasks, map_parallel is efficient and easy to use.
- For CPU-bound tasks, Python 3.14+ free-threading can unlock much higher throughput.
- Always consider the cost of thread creation and coordination for large datasets.

## Core Features

- **Fluent chaining**: Build queries step-by-step with `.from_()`, `.where()`, `.distinct()`, `.recursive()`, `.select()`, `.files()`, `.take()`, `.order_by()`, `.paginate()`, `.map_parallel()`, `.exists()`, `.count()`, `.first()`
- **SQL-like filtering**: Use lambda expressions for flexible, readable file selection
- **Pattern matching**: Use `matches()` for shell-style wildcards and multi-pattern filtering
- **Calendar windows**: Filter by month, week, quarter, etc. using TPath and Frist
- **Streaming and materialization**: Process files one-by-one or collect results as lists
- **Parallel mapping**: Use `map_parallel()` for multi-threaded file processing
- **Logging support**: Attach a logger for progress, errors, and stats

## Chaining and SQL-like Functionality

PQuery lets you chain methods to build complex queries in a readable, declarative style:

```python
# Find Python files larger than 1MB, modified this week, in multiple directories
files = (
    PQuery()
    .from_("./src", "./lib")
    .recursive(True)
    .where(lambda p: matches(p, "*.py") and p.size.mb > 1 and p.mtime.cal.in_days(-7, 0))
    .distinct()
    .order_by(key=lambda p: p.mtime.timestamp, ascending=False)
    .take(10)
)
for file in files:
    print(file.name, file.size.mb, file.mtime)
```

### Method Overview

- `.from_(...)` — Set starting directories (accepts multiple paths)
- `.recursive(True)` — Enable deep traversal
- `.where(lambda p: ...)` — Filter files with custom logic
- `.distinct()` — Remove duplicates
- `.order_by(key=..., ascending=True)` — Sort results
- `.take(n, key=..., reverse=True)` — Get top-N files efficiently
- `.select(lambda p: ...)` — Transform results
- `.files()` — Stream matching files
- `.paginate(page_size)` — Process files in batches
- `.map_parallel(func, workers=4)` — Parallel mapping
- `.exists()` — Check if any files match
- `.count()` — Count matches
- `.first()` — Get first match

### Pattern Matching

Use `matches()` for shell-style pattern filtering:

```python
from pquery import matches
matches("app.log", "*.log")                # True
matches("data.csv", "*.csv", "*.tsv")      # True
matches("backup_2024.zip", "backup_202[3-4]*") # True
```

### Calendar Window Filtering

Filter by calendar windows using TPath and Frist:

```python
from tpath import TPath
file = TPath("report.pdf")
if file.mtime.cal.in_months(-1, 0):
    print("Modified in the last month!")
```

### Streaming vs. Materialization

```python
# Streaming (memory efficient)
for file in PQuery().from_("./data").files():
    process(file)

# Materialize as list
all_files = list(PQuery().from_("./data").files())
```

## Installation

This project is **not yet published on PyPI**.

### Using uv (Recommended)

```bash
uv add git+https://github.com/hucker/pquery.git
git clone https://github.com/hucker/pquery.git
cd pquery
uv sync --dev
```

### Using pip

```bash
pip install git+https://github.com/hucker/pquery.git
```

## Development

See UV_GUIDE.md for details.

```bash
uv sync --dev
uv run python -m pytest
uv build
uv run ruff format
uv run ruff check
```

## Logging Support

Attach a logger to track query progress, errors, and stats:

```python
import logging
from pquery import PQuery

logger = logging.getLogger("pquery")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
logger.addHandler(handler)

PQuery.set_logger(logger)
for file in PQuery(from_="/logs").files():
    process(file)
```


## PQuery - Powerful File Querying

**PQuery provides a fluent, chainable API for filtering files based on age, size, and other properties.** It's designed for complex file filtering operations with readable, expressive syntax.

> **Note**: PQuery is NOT SQL and is not meant to replicate all SQL features. It provides a paradigm with SQL-like characteristics for file operations, enabling semantically similar operations like filtering, sorting, and result transformation in a familiar pattern.

While the code mimics SQL, it does so only to provide a set of tools that allows you to think abstractly about filtering files using a tool set that many programmers are familiar with. There is no query optimizer beyond being careful with calling stat.

### Basic Usage

```python
from tpath import PQuery

# Simple queries - starts from current directory by default
q = PQuery()

# Find files by extension (streaming)
for file in q.where(lambda p: p.suffix == '.py').files():
    process_python_file(file)

# Get list when needed
python_files = list(q.where(lambda p: p.suffix == '.py').files())

# Find files by size
large_files = list(q.where(lambda p: p.size.mb > 10).files())

# Find files by calendar window
recent_files = list(q.where(lambda p: p.mtime.cal.in_days(-7, 0)).files())

# Complex combined criteria
old_large_logs = list(q
    .where(lambda p: p.suffix == '.log' and p.size.mb > 50 and p.age.days > 30)
    .files())
```

### Deep File System Traversal

> **PQuery uses a stack-based directory walker, not recursion.** This means it can traverse extremely deep file systems without any risk of Python stack overflow. The traversal is fully iterative, so you can safely query directories with thousands of nested levels.

### Type Safety Best Practices

For optimal type checking and IDE support, consider using typed functions instead of inline lambdas:

```python
from tpath import PQuery, TPath

# Instead of inline lambdas (limited type inference)
large_files = PQuery().where(lambda p: p.size.mb > 10).files()

# Use typed functions for better type checking
def is_large_file(path: TPath) -> bool:
    """Check if file is larger than 10MB."""
    return path.size.mb > 10

def get_file_info(path: TPath) -> dict[str, str | float]:
    """Extract file metadata for reporting."""
    return {
        'name': path.name,
        'size_mb': path.size.mb,
        'age_days': path.age.days
    }

# Better type safety and IDE support
large_files = PQuery().where(is_large_file).files()
file_info = PQuery().where(is_large_file).select(get_file_info)  # Returns list[Any]
```

> **Important:** When using `where`, you must pass the function itself (e.g., `where(is_large_file)`), not the result of calling the function (e.g., `where(is_large_file())`). Passing the result of a function call will cause a type error and is a common mistake caught by the type checker.

### Method Chaining/Fluent Interface

Properties can be set in the boject initialzation by filling in the appropriate keyword args.  A fluent interface
is also available that can be more readable.

```python
# Build complex queries step by step
cleanup_files = (PQuery()
    .from_("/var/log")              # Set starting directory
    .recursive(True)                # Include subdirectories
    .where(lambda p: p.suffix in ['.log', '.tmp'] and p.age.days > 30)
    .files()
)

# Search multiple directories at once
all_logs = (PQuery()
    .from_("/var/log", "/opt/app/logs", "/home/user/logs")  # Multiple paths
    .where(lambda p: p.suffix == '.log')
    .files()
)

# Remove duplicate files from results (useful with overlapping search paths)
unique_logs = (PQuery()
    .from_(log_dirs, backup_dirs, "/var/log")  # Multiple sources may overlap
    .where(lambda p: p.suffix == '.log')
    .distinct()                                # Remove duplicate files from results
)

# Execute and process results
total_size = sum(f.size.bytes for f in cleanup_files)
print(f"Found {len(cleanup_files)} files totaling {total_size // 1024**2} MB")
```

### Result Transformation with select()

Transform results into more useful formats:

```python
# Get file names and sizes as tuples
file_info = (PQuery()
    .from_("./logs")
    .where(lambda p: p.suffix == '.log')
    .select(lambda p: (p.name, p.size.mb))
)
# Returns: [('app.log', 2.3), ('error.log', 0.8), ...]

# Create custom dictionaries
file_metadata = (PQuery()
    .from_("./documents")
    .where(lambda p: p.suffix in ['.pdf', '.docx'])
    .select(lambda p: {
        'name': p.name,
        'size_mb': p.size.mb,
        'age_days': p.mtime.age.days
    })
)
# Returns: [{'name': 'report.pdf', 'size_mb': 2.1, 'age_days': 5}, ...]
```

### Utility Methods

```python
# Check if any files match (without loading all results)
has_large_files = (PQuery()
    .from_("./data")
    .where(lambda p: p.size.gb > 1)
    .exists()
)

# Count matching files (must crawl all files)
num_python_files = (PQuery()
    .from_("./src")
    .where(lambda p: p.suffix == '.py')
    .count()
)

# Get first match
latest_log = (PQuery()
    .from_("./logs")
    .where(lambda p: p.suffix == '.log')
    .first()  # Returns TPath or None
)
```

### Sorting and Top-K Selection

```python
# Get top 10 largest files (efficient for top-k)
largest_files = (PQuery()
    .from_("./data")
    .take(10, key=lambda p: p.size.bytes)
)

# Sort all files by modification time
all_by_time = (PQuery()
    .from_("./logs")
    .order_by(key=lambda p: p.mtime.timestamp, ascending=False)
)

# Performance tip: use take() for top-N, order_by() for complete ordering
```

### Pagination for Large Datasets

```python
# Process files in batches to manage memory
for page in PQuery().from_("./massive/dataset").paginate(100):
    process_batch(page)
    print(f"Processed {len(page)} files")

# Web API pagination
query = PQuery().from_("./documents").where(lambda p: p.suffix == '.pdf')
pages = list(query.paginate(20))  # Get all pages
first_page = pages[0] if pages else []


### Parallel mapping (map_parallel)

For higher-throughput file processing you can use the PQuery.map_parallel
terminal method. It runs a single producer (crawler) thread that walks the
filesystem and one or more consumer worker threads that call your mapping
function on each :class:`~tpath._core.TPath`.

Key points:
- The method yields :class:`MapResult` objects for each processed file. A
    MapResult contains: ``path`` (the TPath), ``execution_time`` (seconds),
    ``success`` (bool), ``exception`` (Exception or ``None``), and ``data`` (the
    function return value when success is True).
- Default design is one crawler + one worker. Increase ``workers`` to run more
    workers in parallel (useful for IO-bound mapping functions).
- ``exception_policy`` controls behavior on failure:
    - ``'continue'`` or ``'collect'``: emit a MapResult with ``success=False``
        and continue processing other files.
    - ``'exit'``: emit the failure result, then attempt to stop the producer
        and other workers as soon as possible.
- Use ``take()`` or ``paginate()`` when you only need partial results; the
    crawler still respects those terminal operations.

Example (inspect results):

```python
from pquery import

# Map file -> file name using 4 workers
for res in PQuery.from("./src").where(lambda p: p.suffix == ".py").map_parallel(lambda p: p.name, workers=4):
    if res.success:
            print("OK:", res.path, res.data, f"{res.execution_time:.3f}s")
    else:
            print("ERR:", res.path, res.exception)
```

Performance note:

- map_parallel is thread-based and is a good fit for IO-bound functions
    (reading files, network calls). For heavy CPU-bound mapping functions
    consider a multiprocessing approach (outside the scope of this helper) to
    avoid the GIL or use freethreading versions of python.


## Manual page iteration

```python
paginator = query.paginate(50)
page1 = next(paginator, [])  # First 50 files
page2 = next(paginator, [])  # Next 50 files
page3 = next(paginator, [])  # Next 50 files

# Efficient batch processing with progress
total_processed = 0
for page_num, page in enumerate(query.paginate(200)):
    total_processed += len(page)
    print(f"Page {page_num + 1}: processed {total_processed} files")
    
    # Process each file in the page
    for file in page:
        backup_file(file)
```

## Streaming vs. Materialization

**PQuery uses streaming by default for memory efficiency:**

```python
# Streaming (memory efficient) - processes one file at a time
for file in PQuery().from_("./large/dataset").files():
    process_file(file)  # Starts immediately, uses O(1) memory
    if should_stop():
        break  # Can exit early

# Materialization (when you need a list)
all_files = list(PQuery().from_("./data").files())  # O(n) memory
count = len(all_files)      # Can get length
first = all_files[0]        # Can index
for file in all_files:      # Can iterate multiple times
    process_file(file)

# Transform results with select (also streaming)
for name in PQuery().from_("./logs").select(lambda p: p.name):
    print(name)  # Streams file names

# Materialize selected results when needed
file_names = list(PQuery().from_("./logs").select(lambda p: p.name))
```

**When to use each approach:**

- **Streaming**: Large datasets, one-time processing, memory constrained
- **Lists**: Need length/indexing, multiple iterations, small datasets

## Performance and Efficiency

PQuery uses lazy evaluation - filters are only applied when you call execution methods:

```python
# Build the query (no filesystem operations yet)
q = PQuery().from_("/large/directory").where(lambda p: p.size.gb > 5)

# Only now does it scan the filesystem
large_files = q.files()  # Execute the query

# Reuse queries efficiently
more_files = q.where(lambda p: p.suffix == '.mp4').files()
```

## Efficiency Guide for Large Datasets

Different operations have vastly different performance characteristics:

```python
# ⚡ MOST EFFICIENT - Early termination operations  
query.take(10)                    # O(10) - stops after 10 files
query.first()                     # O(1) - stops after first match  
query.exists()                    # O(1) - stops after first match
query.distinct().take(10)         # O(k≤n) - stops after 10 unique files

# ⚡ STREAMING - Memory efficient processing
for file in query.files():        # O(1) memory - processes one file at a time
    process_file(file)

# ⚡ PAGINATION - Batch processing with controlled memory
for page in query.paginate(100):  # O(100) memory - processes 100 files at a time
    process_batch(page)

# 📈 EFFICIENT - Heap-based top-N selection  
query.take(10, key=lambda p: p.size.bytes)           # O(n + 10 log n) - top 10 largest
query.take(10, key=lambda p: p.size.bytes, reverse=False)  # O(n + 10 log n) - top 10 smallest
query.take(5, key=lambda p: p.mtime.timestamp)       # O(n + 5 log n) - 5 newest files

# 🐌 EXPENSIVE - Must materialize full results
list(query.files())               # O(n) memory - loads all files into list
query.count()                     # O(n) - must count all matches
query.order_by()                      # O(n log n) - full sort required

# 💡 Performance Tips:
# - Use streaming: for file in query.files() for memory efficiency
# - Use pagination: for page in query.paginate(100) for batch processing
# - Use list() only when you need random access or length
# - Use distinct().take(n) for unique results with early stopping
# - Use take(n, key=...) instead of list(query.order_by().take(n)) when possible  
# - Chain filters early: .where().distinct().take() is optimal order
# - Use exists() instead of len(list(query.files())) > 0 to check for matches
```


### Integration with PQuery

Use `matches()` with PQuery's `.where()` method for powerful file filtering:

```python
from tpath import PQuery, matches

# Find log files using pattern matching
log_files = (PQuery()
    .from_("./logs")
    .where(lambda p: matches(p, "*.log", "*.LOG", case_sensitive=False))
    .files()
)

# Find configuration files across project
config_files = (PQuery()
    .from_("./")
    .recursive(True)
    .where(lambda p: matches(p, "*.conf", "*.ini", "*config*", "*.yaml", "*.json"))
    .files()
)

# Complex filtering: large Python files with test patterns
test_files = (PQuery()
    .from_("./")
    .recursive(True)
    .where(lambda p: matches(p, "*test*", "*_test.py", "test_*.py") and p.size.kb > 10)
    .files()
)

# Clean up temporary files by pattern
temp_files = (PQuery()
    .from_("./")
    .recursive(True)
    .where(lambda p: matches(p, "*.tmp", "*.temp", ".*", "~*", full_path=True) and 
                     p.age.days > 7)
    .files()
)

# Backup candidates - important file types from recent activity
backup_files = (PQuery()
    .from_("/home/user/documents")
    .recursive(True)
    .where(lambda p: matches(p, "*.doc*", "*.pdf", "*.xls*", "*.ppt*") and
                     p.size.mb > 1 and
                     p.mtime.cal.in_months(-3, 0))  # Modified in last 3 months
    .files()
)
```

### Pattern Examples

```python
# File extensions
matches("document.pdf", "*.pdf")                      # Standard extension
matches("script.py", "*.py", "*.js", "*.ts")          # Multiple extensions

# Wildcards
matches("backup_2024_12_25.zip", "backup_*")          # Prefix matching
matches("temp_file_123.txt", "*_temp_*", "*temp*")    # Contains pattern
matches("file.backup.old", "*.*.old")                 # Multiple dots

# Character classes  
matches("data2024.csv", "data[0-9][0-9][0-9][0-9]*") # Year pattern
matches("fileA.txt", "file[A-Z].*")                   # Letter range
matches("config_prod.ini", "*[!dev]*")                # Exclude pattern

# Real-world patterns
matches("error.log.2024-01", "*.log.*")               # Rotated logs
matches("Thumbs.db", "[Tt]humbs.db")                  # Case variants
matches("~document.tmp", "~*", "*.tmp", ".*")         # Temporary patterns
```

### Supported Pattern Syntax

| Pattern  | Description                   | Example         | Matches                            |
| -------- | ----------------------------- | --------------- | ---------------------------------- |
| `*`      | Zero or more characters       | `*.log`         | `app.log`, `error.log.old`, `.log` |
| `?`      | Any single character          | `file?.txt`     | `file1.txt`, `fileA.txt`           |
| `[seq]`  | Any character in sequence     | `data[0-9].csv` | `data1.csv`, `data9.csv`           |
| `[!seq]` | Any character NOT in sequence | `*[!0-9].txt`   | `fileA.txt`, `file_.txt`           |
| `[a-z]`  | Character range               | `[A-Z]*.py`     | `Main.py`, `Test.py`               |

### Performance Notes

- `matches()` is optimized for single file checking
- For bulk operations, combine with PQuery's lazy evaluation
- Pattern compilation is cached internally for repeated use
- Use `full_path=False` (default) when possible for better performance

## Advanced Features

### Calendar Window Filtering

**TPath provides intuitive calendar window filtering to check if files fall within specific time ranges.** This is perfect for finding files from "last week", "this month", "last quarter", etc.

### Key Features

- **Intuitive API**: Negative numbers = past, 0 = now, positive = future
- **Window checking**: `in_*` methods clearly indicate boundary checking (not duration measurement)
- **Mathematical conventions**: Follows standard mathematical notation for time offsets
- **Multiple time units**: Minutes, hours, days, months, quarters, years

### Basic Calendar Windows

```python
from tpath import TPath

path = TPath("document.txt")

# Single time point checks
path.mtime.cal.in_days(0)        # Modified today
path.mtime.cal.in_months(0)      # Modified this month  
path.mtime.cal.in_years(0)       # Modified this year

# Past time windows
path.mtime.cal.in_days(-1)       # Modified yesterday
path.mtime.cal.in_hours(-6)      # Modified 6 hours ago
path.mtime.cal.in_minutes(-30)   # Modified 30 minutes ago
```

### Range-Based Window Filtering

The real power comes from range-based filtering using `start` and `end` parameters:

```python
# Last 7 days through today
path.mtime.cal.in_days(-7, 0)

# Last 30 days through today  
path.mtime.cal.in_days(-30, 0)

# From 2 weeks ago through 1 week ago (excluding this week)
path.mtime.cal.in_days(-14, -7)

# Last 6 months through this month
path.mtime.cal.in_months(-6, 0)

# Last quarter only (excluding current quarter)
path.mtime.cal.in_quarters(-1, -1)

# Last 2 years through this year
path.mtime.cal.in_years(-2, 0)
```

### Real-World Examples

```python
from tpath import TPath
from pathlib import Path

# Find all Python files modified in the last week
project_dir = Path("my_project")
recent_python_files = [
    TPath(f) for f in project_dir.rglob("*.py") 
    if TPath(f).mtime.cal.in_days(-7, 0)
]

# Archive old log files (older than 30 days)
log_dir = Path("/var/log")
old_logs = [
    TPath(f) for f in log_dir.glob("*.log")
    if not TPath(f).mtime.cal.in_days(-30, 0)  # NOT in last 30 days
]

# Find large files created this quarter
large_recent_files = [
    TPath(f) for f in Path("/data").rglob("*")
    if TPath(f).size.mb > 100 and TPath(f).ctime.cal.in_quarters(0)
]

# Backup files from last month only
backup_candidates = [
    TPath(f) for f in Path("/important").rglob("*")
    if TPath(f).mtime.cal.in_months(-1, -1)  # Last month only
]
```

### Working with Different Time Types

TPath provides calendar filtering for all timestamp types:

```python
path = TPath("important_file.txt")

# Creation time windows
path.ctime.cal.in_days(-7, 0)     # Created in last 7 days
path.create.cal.in_months(0)      # Created this month (alias)

# Modification time windows  
path.mtime.cal.in_hours(-6, 0)    # Modified in last 6 hours
path.modify.cal.in_days(-1)       # Modified yesterday (alias)

# Access time windows
path.atime.cal.in_minutes(-30, 0) # Accessed in last 30 minutes
path.access.cal.in_weeks(-2, 0)   # Accessed in last 2 weeks (alias)
```

### Precision vs. Convenience

**Important distinction**: Calendar windows check **boundaries**, not precise durations.

```python
# This checks if file was modified between "7 days ago at current time" and "now"
# The actual span varies from ~6-7 days depending on when you run it
path.mtime.cal.in_days(-7, 0)

# For precise duration checking, use age properties instead:
path.age.days < 7  # Exactly less than 7 * 24 hours
```

Calendar windows are perfect for **"last week", "this month", "last quarter"** type queries where you want natural calendar boundaries, not precise 168-hour periods.

### Config File Integration

**Perfect for reading configuration values!** All property types support parsing from strings:

```python
from tpath import TPath
from tpath._size import Size
from tpath._age import Age
from tpath._time import Time

# Parse size strings (great for config files)
max_size = Size.parse("100MB")        # → 100,000,000 bytes
cache_limit = Size.parse("1.5GiB")    # → 1,610,612,736 bytes
temp_limit = Size.parse("500MB")      # → 500,000,000 bytes

# Parse age/time duration strings
cache_expire = Age.parse("24h")        # → 86,400 seconds
cleanup_after = Age.parse("7d")        # → 604,800 seconds  
session_timeout = Age.parse("30m")     # → 1,800 seconds

# Parse datetime strings
backup_date = Time.parse("2023-12-25")           # → datetime object
log_time = Time.parse("2023-12-25 14:30:00")     # → datetime object
timestamp = Time.parse("1640995200")             # → Unix timestamp to datetime

# Real-world config file usage
config = {
    "cache": {"max_size": "1GB", "expire_after": "24h"},
    "backup": {"date": "2023-01-01", "max_size": "10GiB"}
}

# Parse and use config values
max_cache = Size.parse(config["cache"]["max_size"])
expire_time = Age.parse(config["cache"]["expire_after"])
backup_time = Time.parse(config["backup"]["date"])

# Use with actual files
if path.size.bytes > max_cache:
    print("File too large for cache!")
if path.age.seconds > expire_time:
    print("File expired!")
```

## Key Features & Benefits

- **Property-based design**: Direct access to common file properties without calculations
- **Full pathlib compatibility**: Drop-in replacement for pathlib.Path
- **Natural syntax**: `path.age.days` instead of complex timestamp math
- **Shell-style pattern matching**: Standalone `matches()` function with fnmatch wildcards
- **Calendar window filtering**: Intuitive `in_*` methods for time range checking
- **Comprehensive time units**: seconds, minutes, hours, days, weeks, months, quarters, years
- **Multiple size units**: bytes, KB/KiB, MB/MiB, GB/GiB, TB/TiB, PB/PiB
- **Config file integration**: Parse strings with Size.parse(), Age.parse(), Time.parse()
- **Different time types**: Handle ctime, mtime, atime separately with user-friendly aliases
- **Performance optimized**: Cached stat calls to avoid repeated filesystem operations
- **Mathematical conventions**: Negative = past, 0 = now, positive = future
- **Property Based Dates by [Frist](https://github.com/hucker/frist)**

## Development

This project uses uv for dependency management and packaging. See UV_GUIDE.md for detailed instructions.

```bash
# Install development dependencies
uv sync --dev

# Run tests  
uv run python -m pytest

# Build package
uv build

# Format code
uv run ruff format

# Lint code
uv run ruff check
```

## Logging Support

PQuery supports flexible logging for query operations and statistics. You can attach a standard Python logger to track progress, errors, and matched files during queries.

**How to enable logging:**

- Pass a `logging.Logger` instance to the `PQuery` constructor or use the class method `PQuery.set_logger()` to set a logger for all queries.
- Log messages include query start, progress (every N files), errors, and completion.

**Example: Setting a class-level logger**

```python
import logging
from src.pquery import PQuery

logger = logging.getLogger("pquery_global")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
logger.addHandler(handler)

PQuery.set_logger(logger)  # Set class-level logger for all queries

query = PQuery(from_="/logs")
for file in query.files():
    process(file)
```

You can also pass a logger to an individual query if you want per-instance logging.

See tests in `test/pquery/test_logger.py` for more usage patterns.

## License

MIT License - see LICENSE file for details.
