Metadata-Version: 2.4
Name: tds-flatpath
Version: 1.1.0
Summary: Collision-proof, reversible path flattening
Project-URL: Homepage, https://github.com/XRReady/tds-flatpath
Project-URL: Repository, https://github.com/XRReady/tds-flatpath
Project-URL: Issues, https://github.com/XRReady/tds-flatpath/issues
Author-email: Dale Spencer <texasdatasafel@gmail.com>
License: MIT License
License-File: LICENSE
Keywords: encoding,filesystem,flattening,path,reversible
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Provides-Extra: dev
Requires-Dist: black>=24.0; extra == 'dev'
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.6.0; extra == 'dev'
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == 'test'
Description-Content-Type: text/markdown

# tds-flatpath

Reversible, collision-proof filename flattening for storing hierarchical paths in **flat namespaces** (object stores, zip members, cache keys, build artifacts, etc.) — while keeping names **human-readable**.

Unlike hashing or escape-heavy encodings, `tds-flatpath` preserves meaningful stems and grows only as needed to resolve actual ambiguity (underscore density + depth). No side tables. Fully deterministic. Exactly reversible.

**Use cases include:**
- Storing directory trees in S3 / MinIO / Azure blob stores
- Packaging project resources inside zip/tar layers
- Cache key derivation where readability matters
- Stable artifact naming across environments and OSes

**Guarantees:** (see [tds-flatpath_Specification](https://github.com/XRReady/tds-flatpath/blob/main/tds-flatpath_Specification.md))
- No collisions: mapping is **injective**
- Fully reversible: decode returns the original Path Tuple
- Human-visible stems stay readable
- Extension preserved exactly
- Scales only with actual underscore usage and directory depth


> Built by [Texas Data Safe (tds)](https://x.com/TexasDataSafe). Designed for packaging file trees into flat stores (object stores, zip members, temp dirs, cache keys) without losing reversibility or readability.

## Project structure

```

tds-flatpath/
├── src/
│   └── tds_flatpath/
│       ├── **init**.py         # Exports TdsFlatNameCodecV1, FlatPathMode
│       └── codec.py            # Core reversible flattening logic
├── tests/
│   ├── benchmark.py            # compare performance against SHA256
│   └── test_codec.py           # Unit tests for codec behavior
├── tds-flatpath_Specification  # Language angostic specification
├── pyproject.toml              # Build & packaging configuration
├── FLATPATH_PATTERN_PREEXT.md  # Concise context for LLM to understand PREEXT flattening
├── FLATPATH_PATTERN_SUFFIX.md  # Concise context for LLM to understand SUFFIX flattening
├── README.md                   # Project documentation
├── LICENSE                     # MIT license
└── CITATION.cff                # Citation metadata

```

## Why this over hashing or length-prefixed joins?

- **Human-readable** stems (`src_module_mhello.py`) instead of opaque hashes.
- **Deterministic and reversible**: an underscore-run “postfix” encodes only the ambiguity you need to resolve (underscore counts + directory depth).
- **Shorter than hashes** for typical paths; scales with actual collisions, not worst-case characters.
- **Provably collision-free** (injective mapping), unlike hashes which provide *statistical* collision resistance. Fine in practice, but it's the principle of the matter, right? ;)

## AI / LLM Context

This repository includes concise **Pattern Definition** files designed to be injected into AI context windows (RAG, Agents, MCP). These files act as a semantic legend, allowing an LLM to "read" the original directory structure from the flattened filenames without needing code execution.

* **`FLATPATH_PATTERN_SUFFIX.md`**: Include this context if using the default `SUFFIX` mode.
* **`FLATPATH_PATTERN_PREEXT.md`**: Include this context if using the `PREEXT` mode.

## Install

```bash
pip install tds-flatpath
````

Python 3.8+ recommended.

## Quick start

```python
from tds_flatpath import TdsFlatNameCodecV1, FlatPathMode

tdsFNC = TdsFlatNameCodecV1()

# --- Standard Mode (Suffix) ---
# Preserves reversibility unconditionally.

print(tdsFNC.flat_name(("README.md",)))
# -> README.md

print(tdsFNC.flat_name(("src", "README.md")))
# -> src_README.md_-

print(tdsFNC.flat_name(("src", "module", "mhello.py")))
# -> src_module_mhello.py_--

print(tdsFNC.flat_name(("src", "a_b", "c_d.txt")))
# -> src_a_b_c_d.txt_-n1-n1

# --- Pre-Extension Mode (New in V1.1) ---
# Preserves extension at the end (if extension has no underscores).

print(tdsFNC.flat_name(("src", "utils", "data.json"), mode="preext"))
# -> src_utils_data_--.json

# --- Reversing ---

print(tdsFNC.unflatten_to_path("src_a_b_c_d.txt_-n1-n1"))
# -> ('src', 'a_b', 'c_d.txt')
```

## Specification (V1.1)

The encoding format used by `tds-flatpath` is precisely defined in a stable,
versioned specification document:

[**tds-flatpath Specification (V1.1)** ](https://github.com/XRReady/tds-flatpath/blob/main/tds-flatpath_Specification.md)

This specification guarantees:

  * Deterministic, collision-proof mapping
  * Full reversibility (no metadata side-tables required)
  * Human-readable flattened names
  * Growth proportional only to actual underscore ambiguity and depth

The current implementation `TdsFlatNameCodecV1` conforms to **Format Version V1.1** (which includes the optional Pre-Extension mode).

## How it works

  * We join segments with `_` **only in the base** (e.g., `src_a_b_c.txt`).
  * A compact **postfix** after a final underscore captures:
      * counts of consecutive `_` **inside** each original segment (`n<HEX>` tokens),
      * and `-` markers for **directory boundaries**.
  * With that, decoding is deterministic and collision-proof.

Example:

```
('src', 'a_b', 'c___d.txt')
flatten -> "src_a_b_c___d.txt_-n1-n3"
```

## Benchmark Results

You can run the benchmark locally:

```bash
pip install psutil
pip install tds-flatpath
python tests\benchmark.py
```

No installation is required if the repository is cloned directly.
The benchmark also works when the package is installed in an environment.

### System Information

```
Platform: Windows 11 (10.0.26200)
Machine: AMD64
Processor: AMD64 Family 25 Model 80 Stepping 0, AuthenticAMD
CPU Cores: 12
Python: 3.13.2
CPU Frequency: 3.90 GHz (max 3.90 GHz)
RAM: 133.05 GB
```

### Length Benchmark

| Sample | Path Example (truncated)          | tds-flatpath Len | SHA256 Len | Length-Prefixed Len |
| ------ | --------------------------------- | ---------------- | ---------- | ------------------- |
| 1      | README.md                         | 9                | 67         | 11                  |
| 2      | src/README.md                     | 15               | 67         | 17                  |
| 3      | src/module/mhello.py              | 23               | 67         | 26                  |
| 4      | src/a\_b/c\_d.txt                   | 22               | 68         | 21                  |
| 5      | src/a/b/c\_d.txt                   | 21               | 68         | 23                  |
| 6      | src/a\_\_b\_a\_b/c\_\_\_d.txt            | 33               | 68         | 28                  |
| 7      | very/deep/path/with/many/level... | 47               | 68         | 54                  |
| 8      | file\_with\_many\_\_\_underscores\_\_... | 45               | 68         | 39                  |
| 9      | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa... | 104              | 68         | 108                 |
| 10     | dir1/dir2/dir3/dir4/dir5/file.... | 38               | 67         | 44                  |
| 11     | **a**\_\_\_\_\_\_\_\_\_... | 41               | 68         | 38                  |

**Averages**

  * tds-flatpath: 36.2
  * SHA256: 67.6
  * Length-Prefixed: 37.2

### Time Benchmark (10,000 Random Paths)

| Method                         | Time (seconds) |
| ------------------------------ | -------------- |
| tds-flatpath (with validation) | 0.4093         |
| tds-flatpath (no validation)   | 0.3973         |
| SHA256                         | 0.3601         |
| Length-Prefixed                | 0.3499         |

## API

```python
class FlatPathMode(str, Enum):
    SUFFIX = "suffix"   # filename.ext_postfix (Default)
    PREEXT = "preext"   # filename_postfix.ext (RFC V1.1)

class TdsFlatNameCodecV1:
    @classmethod
    def flat_name(cls, 
                  path_array: tuple[str, ...], 
                  mode: Union[FlatPathMode, str] = FlatPathMode.SUFFIX, 
                  validate: bool = True) -> str: ...

    @classmethod
    def unflatten_to_path(cls, 
                          flattened_filename: str, 
                          mode: Union[FlatPathMode, str] = FlatPathMode.SUFFIX) -> tuple[str, ...]: ...

    @classmethod
    def postfix_to_counts(cls, postfix: str) -> list[int]: ...
```

**Constraints**

  * `path_array` must be a **tuple** of non-empty strings (for immutability/hashing safety).
  * OS separators are not allowed in segments.
  * **Pre-Extension Mode** requires that the file extension does NOT contain underscores.

## Versioning & compatibility

  * Current version: **1.1.0**
  * **Breaking Change in 1.1.0:** API now requires `tuple` inputs instead of `list`.
  * Specification: **V1.1** (Stable).

## License

MIT — feel free to use in open source or commercial projects.
**Please retain credit to Texas Data Safe (tds) / Dale Spencer.**

## Contributing

Issues and PRs welcome. Please include:

  * a failing test case for bugs,
  * before/after examples for behavior changes.

## Cite this project

If this helps your work, please cite (see `CITATION.cff`):

```
Spencer, D. (2025). tds-flatpath (Version 1.1.0). Texas Data Safe.
```
