Metadata-Version: 2.4
Name: splurge-dsv
Version: 2025.5.2
Summary: A utility library for working with DSV (Delimited String Values) files
Author: Jim Schilling
License-Expression: MIT
Project-URL: Homepage, https://github.com/jim-schilling/splurge-dsv
Project-URL: Repository, https://github.com/jim-schilling/splurge-dsv
Project-URL: Documentation, https://github.com/jim-schilling/splurge-dsv#readme
Project-URL: Bug Tracker, https://github.com/jim-schilling/splurge-dsv/issues
Keywords: dsv,csv,tsv,delimited,parsing,file-processing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Filters
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyYAML>=6.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: ruff>=0.0.241; extra == "dev"
Requires-Dist: pytest-mock>=3.0.0; extra == "dev"
Requires-Dist: hypothesis>=6.0.0; extra == "dev"
Requires-Dist: pre-commit>=4.3.0; extra == "dev"
Dynamic: license-file

# splurge-dsv

[![PyPI version](https://badge.fury.io/py/splurge-dsv.svg)](https://pypi.org/project/splurge-dsv/)
[![Python versions](https://img.shields.io/pypi/pyversions/splurge-dsv.svg)](https://pypi.org/project/splurge-dsv/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)

[![CI](https://github.com/jim-schilling/splurge-dsv/actions/workflows/ci-quick-test.yml/badge.svg)](https://github.com/jim-schilling/splurge-dsv/actions/workflows/ci-quick-test.yml)
[![Coverage](https://img.shields.io/badge/coverage-90%25-brightgreen.svg)](https://github.com/jim-schilling/splurge-dsv)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![mypy](https://img.shields.io/badge/mypy-checked-black)](https://mypy-lang.org/)

A robust Python library for parsing and processing delimited-separated value (DSV) files with advanced features for data validation, streaming, and error handling.

## Features

- **Multi-format DSV Support**: Parse CSV, TSV, pipe-delimited, and custom delimiter separated value files/objects
- **Configurable Parsing**: Flexible options for delimiters, quote characters, escape characters, header/footer row(s) handling
- **Memory-Efficient Streaming**: Process large files without loading entire content into memory
- **Security & Validation**: Comprehensive path validation and file permission checks
- **Unicode Support**: Full Unicode character and encoding support
- **Type Safety**: Full type annotations with mypy validation
- **Deterministic Newline Handling**: Consistent handling of CRLF, CR, and LF newlines across platforms
- **CLI Tool**: Command-line interface for quick parsing and inspection of DSV files
- **Robust Error Handling**: Clear and specific exceptions for various error scenarios
- **Modern API**: Object-oriented API with `Dsv` and `DsvConfig` classes for easy configuration and reuse
- **Comprehensive Documentation**: In-depth API reference and usage examples
- **Exhaustive Testing**: 272 tests with 90% code coverage including property-based testing, edge case testing, and cross-platform compatibility validation

**⚠️ CHANGES in v2025.5.1**
> - **Vendored Dependencies Update**: `splurge-exceptions` and `splurge-safe-io` have been updated.
> - **Bumped Versions**:
>   - Updated version to 2025.5.1 in `__init__.py` and `pyproject.toml`.
> - **MyPy Configuration Update**:
>   - Updated MyPy configuration to relax strictness on examples.
>   - Updated `ci-lint-and-typecheck.yml` to run MyPy on the entire codebase.
>   - Updated pre-commit hook for MyPy to check the full codebase.
> - **Pytest Coverage Configuration Update**:
>   - Updated coverage configuration to omit vendor and test files from reports.
> - **See [CHANGELOG.md](CHANGELOG.md) for detailed migration notes.**

**⚠️ CHANGES in v2025.5.0**
> - **Vendored Dependencies**: `splurge-exceptions` and `splurge-safe-io` are now vendored.
>   - Removed pip dependencies on `splurge-exceptions` and `splurge-safe-io`.
>   - All functionality remains the same; imports continue to work as before.
>   - See [CHANGELOG.md](CHANGELOG.md) for detailed migration notes.

**⚠️ CHANGES in v2025.4.0**
> - **Exception Hierarchy Refactored**: All exceptions now leverage the `splurge-exceptions` library with a unified hierarchy.
>   - All exceptions inherit from `SplurgeDsvError(SplurgeFrameworkError)`.
>   - Encoding/Decoding errors now map to `SplurgeDsvLookupError`.
>   - File I/O errors map to `SplurgeDsvOSError`.
>   - General runtime errors map to `SplurgeDsvRuntimeError`.
>   - Parameter/type/value validation errors use `SplurgeDsvTypeError`, and `SplurgeDsvValueError`.
>   - Removed: Many specialized `SplurgeDsv*Error` classes (e.g., `SplurgeDsvFileNotFoundError`, `SplurgeDsvFilePermissionError`) in favor of the unified hierarchy.
>   - See [API-REFERENCE.md](docs/api/API-REFERENCE.md) for the complete exception hierarchy and migration guidance.

**⚠️ CHANGES in v2025.3.2**
> - **splurge-safe-io** dependency has been updated to v2025.0.6+.
>   - This change improves compatibility and stability with the latest features of the `splurge-safe-io` package.
>   - Code and tests have been updated to align with the new version of the dependency, ensuring continued robust and secure file I/O operations.

**⚠️ CHANGES in v2025.3.1**
> - **skip_empty_lines** option added to `DsvConfig`, `DsvHelper`, and CLI.
>   - This option allows users to skip logical empty lines when parsing DSV files.

**⚠️ CHANGES in v2025.3.0**
> - **Commit-Only Release**: v2025.3.0 is a commit-only release and will not be published to PyPI.
> - The legacy `parse_stream()` helpers were removed in release 2025.3.0.
>   - Use `parse_file_stream()` on `Dsv`/`DsvHelper` for stream-based parsing of files. This standardizes the API naming and clarifies that streaming helpers accept file paths rather than arbitrary iterables.
> - TextFileHelper, SafeTextFileReader, SafeTextFileWriter, and PathValidator, as well as all their associated tests have been removed in this release.
>   - Their functionality has been migrated in favor of the `splurge-safe-io` package, which provides robust and secure file I/O operations.
>   - This change reduces code duplication and improves maintainability by leveraging the functionality of `splurge-safe-io`.
>   - Users should refer to the `splurge-safe-io` documentation for details on its usage and features.
> - **See API-REFERENCE.md for migration guidance and complete reference documentation, with usage examples.**

**⚠️ CHANGES in v2025.2.2**
> - **Deprecated Warning**: The following modules and their associated classes and functions are deprecated and will be removed in a future release (2025.3.0). Users are encouraged to transition to the `splurge-safe-io` package for these functionalities:
>   - `splurge_dsv.safe_text_file_reader`
>   - `splurge_dsv.safe_text_file_writer`
>   - `splurge_dsv.path_validator`
>   - `splurge_dsv.text_file_helper`
> - **New Exception**: Added `SplurgeDsvFileExistsError` to handle file existence errors.
> - **Fixed Exception Mapping**: Many errors were incorrectly mapped to SplurgeDsvEncodingError; this has been corrected to use appropriate exception types. 
>   - Some exceptions were not mapped to any SplurgeDsv* exception; these have also been corrected.
> - **3rd-Party Dependency Additions**: Added `splurge-safe-io (v2025.0.4)`.
>   - `splurge-safe-io` is a new dependency that provides robust and secure file I/O operations, including safe text file reading and writing with deterministic newline handling and path validation.
>   - This change reduces code duplication and improves maintainability by leveraging the functionality of `splurge-safe-io`.
>   - Users should refer to the `splurge-safe-io` documentation for details on its usage and features.
> - **Code Refactoring**: Refactored `SafeTextFileReader`, `SafeTextFileWriter`, and `PathValidator` to utilize `splurge-safe-io` implementations internally, ensuring consistent behavior and reducing maintenance overhead.
> - **This release maintains backward compatibility** for existing users, but users are encouraged to transition to `splurge-safe-io` for future-proofing their codebases.
>   - **_This release is a commit-only release and will not be published to PyPI._**

**⚠️ BREAKING CHANGES in v2025.2.0**
>
> - **Exception Names Changed**: All exceptions now use `SplurgeDsv*` prefix (e.g., `SplurgeParameterError` → `SplurgeDsvParameterError`)
> - **Resource Manager Removed**: The `ResourceManager` module and all related classes have been completely removed
>
> See the [CHANGELOG](CHANGELOG.md) for migration guidance.

## Installation

```bash
pip install splurge-dsv
```

## Quick Start

### CLI Usage

```bash
# Parse a CSV file
python -m splurge_dsv data.csv --delimiter ,

# Stream a large file
python -m splurge_dsv large_file.csv --delimiter , --stream --chunk-size 1000
```

### YAML configuration file

You can place CLI-equivalent options in a YAML file and pass it to the CLI
using `--config` (or `-c`). CLI arguments override values found in the
YAML file. Example `config.yaml`:

```yaml
delimiter: ","
strip: true
bookend: '"'
encoding: utf-8
skip_header_rows: 1
skip_footer_rows: 0
skip_empty_lines: false
detect_columns: true
chunk_size: 500
max_detect_chunks: 5
raise_on_missing_columns: false
raise_on_extra_columns: false
```

Usage with CLI:

```bash
python -m splurge_dsv data.csv --config config.yaml --delimiter "|"
# The CLI delimiter '|' overrides the YAML delimiter
```

Example using the shipped example config in the repository:

```bash
# Use the example file provided at examples/config.yaml
python -m splurge_dsv data.csv --config examples/config.yaml
```

### API Usage

```python
from splurge_dsv import DsvHelper

# Parse a CSV string
data = DsvHelper.parse("a,b,c", delimiter=",")
print(data)  # ['a', 'b', 'c']

# Parse a CSV file
rows = DsvHelper.parse_file("data.csv", delimiter=",")
```

### Modern API

```python
from splurge_dsv import Dsv, DsvConfig

# Create configuration and parser
config = DsvConfig.csv(skip_header=1)
dsv = Dsv(config)

# Parse files
rows = dsv.parse_file("data.csv")
```

## Documentation

- **[Detailed Documentation](docs/README-details.md)**: Complete API reference, CLI options, and examples
- **[Testing Best Practices](docs/testing_best_practices.md)**: Comprehensive testing guidelines and patterns
- **[Hypothesis Usage Patterns](docs/hypothesis_usage_patterns.md)**: Property-based testing guide
- **[Changelog](CHANGELOG.md)**: Release notes and migration guides

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
----------------------------

This library enforces deterministic newline handling for text files. The reader
normalizes CRLF (`\r\n`), CR (`\r`) and LF (`\n`) to LF internally and
returns logical lines. The writer utilities normalize any input newlines to LF
before writing. This avoids platform-dependent differences when reading files
produced by diverse sources.

Recommended usage:

- When creating files inside the project, prefer the `open_text_writer` context
    manager or `SafeTextFileWriter` which will normalize to LF.
- When reading unknown files, the `open_text` / `SafeTextFileReader` will
    provide deterministic normalization regardless of the source.
- `SplurgeResourceAcquisitionError` - Resource acquisition failures
- `SplurgeResourceReleaseError` - Resource cleanup failures

## Development

### Testing Suite

splurge-dsv features a comprehensive testing suite designed for robustness and reliability:

#### Test Categories
- **Unit Tests**: Core functionality testing (300+ tests)
- **Integration Tests**: End-to-end workflow validation (50+ tests)
- **Property-Based Tests**: Hypothesis-driven testing for edge cases (50+ tests)
- **Edge Case Tests**: Malformed input, encoding issues, filesystem anomalies
- **Cross-Platform Tests**: Path handling, line endings, encoding consistency

#### Running Tests

```bash
# Run all tests
pytest tests/ -v

# Run with coverage report
pytest tests/ --cov=splurge_dsv --cov-report=html

# Run specific test categories
pytest tests/unit/ -v                    # Unit tests only
pytest tests/integration/ -v            # Integration tests only
pytest tests/property/ -v               # Property-based tests only
pytest tests/platform/ -v               # Cross-platform tests only

# Run with parallel execution
pytest tests/ -n 4 --cov=splurge_dsv

# Run performance benchmarks
pytest tests/ --durations=10
```

#### Test Quality Standards
- **94%+ Code Coverage**: All public APIs and critical paths covered
- **Property-Based Testing**: Hypothesis framework validates complex scenarios
- **Cross-Platform Compatibility**: Tests run on Windows, Linux, and macOS
- **Performance Regression Detection**: Automated benchmarks prevent slowdowns
- **Zero False Positives**: All property tests pass without spurious failures

#### Testing Best Practices
- Tests use `pytest-mock` for modern mocking patterns
- Property tests use Hypothesis strategies for comprehensive input generation
- Edge case tests validate error handling and boundary conditions
- Cross-platform tests ensure consistent behavior across operating systems

### Code Quality

The project follows strict coding standards:
- PEP 8 compliance
- Type annotations for all functions
- Google-style docstrings
- 85%+ coverage gate enforced via CI
- Comprehensive error handling

## Changelog

See the [CHANGELOG](CHANGELOG.md) for full release notes.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## More Documentation

- Detailed docs: [docs/README-details.md](docs/README-details.md)
- E2E testing coverage: [docs/e2e_testing_coverage.md](docs/e2e_testing_coverage.md)

## Contributing

Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for detailed information on:

- Development setup and workflow
- Coding standards and best practices
- Testing requirements and guidelines
- Pull request process and review criteria

For major changes, please open an issue first to discuss what you would like to change.

## Support

For support, please open an issue on the GitHub repository or contact the maintainers.
