filoma — Quick interactive examples¶

This notebook demonstrates key filoma capabilities and includes lightweight checks to see if it works in your environment.

It covers: imports and version checks, probing a file and a directory, working with the filoma.DataFrame wrapper, using probe_to_df, a small image probe example, and saving a CSV export.

Note: cells wrap operations in try/except so the notebook still runs if optional dependencies (e.g. polars, numpy, or image backends) are missing.

In [1]:
# Basic environment and import checks
from pathlib import Path

import filoma
from filoma import DataFrame


def check_imports():
    results = {}
    try:
        import filoma

        results["filoma"] = getattr(filoma, "__version__", "unknown")
    except Exception as e:
        results["filoma"] = f"IMPORT ERROR: {e}"

    for pkg in ("polars", "numpy", "PIL"):
        try:
            __import__(pkg if pkg != "PIL" else "PIL.Image")
            results[pkg] = "available"
        except Exception as e:
            results[pkg] = f"missing ({e})"

    # show where we are running the notebook from
    results["cwd"] = str(Path(".").resolve())
    return results


check_imports()
Out[1]:
{'filoma': '1.7.6',
 'polars': 'available',
 'numpy': 'available',
 'PIL': 'available',
 'cwd': '/home/kalfasy/repos/filoma/notebooks'}

1) Quick probe: a single file and a directory¶

Try probing a README or small file, then probe a lightweight sample directory from the repo's tests/ tree.

In [2]:
file_candidate = "../README.md"
dir_candidate = "../tests/"

print("probing file ->", file_candidate)
if file_candidate is not None:
    try:
        file_report = filoma.probe(file_candidate)
        print("file probe result type:", type(file_report))
        try:
            # many filoma dataclasses implement a nice repr or to-dict
            print(file_report)
        except Exception:
            pass
    except Exception as e:
        print("file probe failed:", e)
else:
    print("No small file found to probe in the repository root.")

print("probing directory ->", dir_candidate)
if dir_candidate is not None:
    try:
        dir_report = filoma.probe(dir_candidate, max_depth=2, threads=2)
        print("directory probe returned an object of type:", type(dir_report))
        # If it exposes a to_df() method we can inspect a little
        if hasattr(dir_report, "to_df"):
            try:
                dfw = dir_report.to_df()
                print("to_df() -> wrapper type:", type(dfw))
            except Exception as e:
                print("to_df() raised:", e)
    except Exception as e:
        print("directory probe failed:", e)
else:
    print("No small directory found to probe in tests/; adjust the path and re-run.")
2025-09-20 21:43:20.513 | DEBUG    | filoma.directories.directory_profiler:__init__:352 - Interactive environment detected, disabling progress bars to avoid conflicts
2025-09-20 21:43:20.513 | INFO     | filoma.directories.directory_profiler:probe:439 - Starting directory analysis of '../tests/' using 🦀 Rust (Parallel) implementation
2025-09-20 21:43:20.515 | SUCCESS  | filoma.directories.directory_profiler:probe:455 - Directory analysis completed in 0.00s - Found 179 items (154 files, 25 folders) using 🦀 Rust (Parallel)
2025-09-20 21:43:20.515 | WARNING  | filoma.directories.directory_profiler:to_df:193 - No DataFrame available for analysis at path /home/kalfasy/repos/filoma/tests. DataFrame building is disabled by default or 'polars' is not installed. Call DirectoryProfiler(build_dataframe=True) or use filoma.probe_to_df(...) to obtain a DataFrame.
2025-09-20 21:43:20.513 | INFO     | filoma.directories.directory_profiler:probe:439 - Starting directory analysis of '../tests/' using 🦀 Rust (Parallel) implementation
2025-09-20 21:43:20.515 | SUCCESS  | filoma.directories.directory_profiler:probe:455 - Directory analysis completed in 0.00s - Found 179 items (154 files, 25 folders) using 🦀 Rust (Parallel)
2025-09-20 21:43:20.515 | WARNING  | filoma.directories.directory_profiler:to_df:193 - No DataFrame available for analysis at path /home/kalfasy/repos/filoma/tests. DataFrame building is disabled by default or 'polars' is not installed. Call DirectoryProfiler(build_dataframe=True) or use filoma.probe_to_df(...) to obtain a DataFrame.
probing file -> ../README.md
file probe result type: <class 'filoma.files.file_profiler.Filo'>
Filo(path=PosixPath('/home/kalfasy/repos/filoma/README.md'), size=6413, mode='0o100664', mode_str='-rw-rw-r--', owner='kalfasy', group='kalfasy', created=datetime.datetime(2025, 9, 20, 0, 24, 52), modified=datetime.datetime(2025, 9, 20, 0, 24, 52), accessed=datetime.datetime(2025, 9, 20, 0, 24, 52), is_symlink=False, is_file=True, is_dir=False, target_is_file=None, target_is_dir=None, rights={'read': True, 'write': True, 'execute': False}, inode=7601600, nlink=1, sha256=None, xattrs={})
probing directory -> ../tests/
directory probe returned an object of type: <class 'filoma.directories.directory_profiler.DirectoryAnalysis'>
to_df() -> wrapper type: <class 'NoneType'>

2) Working with filoma.DataFrame wrapper¶

Construct a filoma.DataFrame from a list of paths and run the convenience enrichers: .add_path_components(), .add_file_stats_cols(), and .add_depth_col().

In [3]:
sample_paths = [p for p in (Path("../README.md"), Path("../pyproject.toml"), Path("../Cargo.toml")) if p.exists()]
if not sample_paths:
    # fallback to a couple of files from tests if present
    sample_paths = [p for p in (Path("../tests/test_basic_dataframe.py"), Path("../tests/test_rust_comprehensive.py")) if p.exists()]

print("sample paths used:", sample_paths)
dfw = DataFrame(sample_paths)
print("Initial wrapper and head:")
print(dfw.head(10))

print("With path components:")
try:
    df_components = dfw.add_path_components()
    print(df_components.head(10))
except Exception as e:
    print("add_path_components failed:", e)

print("With file stats:")
try:
    df_stats = dfw.add_file_stats_cols()
    print(df_stats.head(10))
except Exception as e:
    print("add_file_stats_cols failed:", e)

print("Add depth column relative to repo root:")
try:
    df_depth = dfw.add_depth_col(Path("."))
    print(df_depth.head(10))
except Exception as e:
    print("add_depth_col failed:", e)
sample paths used: [PosixPath('../README.md'), PosixPath('../pyproject.toml'), PosixPath('../Cargo.toml')]
Initial wrapper and head:
filoma.DataFrame with 3 rows
shape: (3, 1)
┌───────────────────┐
│ path              │
│ ---               │
│ str               │
╞═══════════════════╡
│ ../README.md      │
│ ../pyproject.toml │
│ ../Cargo.toml     │
└───────────────────┘
With path components:
filoma.DataFrame with 3 rows
shape: (3, 5)
┌───────────────────┬────────┬────────────────┬───────────┬────────┐
│ path              ┆ parent ┆ name           ┆ stem      ┆ suffix │
│ ---               ┆ ---    ┆ ---            ┆ ---       ┆ ---    │
│ str               ┆ str    ┆ str            ┆ str       ┆ str    │
╞═══════════════════╪════════╪════════════════╪═══════════╪════════╡
│ ../README.md      ┆ ..     ┆ README.md      ┆ README    ┆ .md    │
│ ../pyproject.toml ┆ ..     ┆ pyproject.toml ┆ pyproject ┆ .toml  │
│ ../Cargo.toml     ┆ ..     ┆ Cargo.toml     ┆ Cargo     ┆ .toml  │
└───────────────────┴────────┴────────────────┴───────────┴────────┘
With file stats:
filoma.DataFrame with 3 rows
shape: (3, 13)
┌───────────────┬────────────┬──────────────┬──────────────┬───┬─────────┬───────┬────────┬────────┐
│ path          ┆ size_bytes ┆ modified_tim ┆ created_time ┆ … ┆ inode   ┆ nlink ┆ sha256 ┆ xattrs │
│ ---           ┆ ---        ┆ e            ┆ ---          ┆   ┆ ---     ┆ ---   ┆ ---    ┆ ---    │
│ str           ┆ i64        ┆ ---          ┆ str          ┆   ┆ i64     ┆ i64   ┆ str    ┆ str    │
│               ┆            ┆ str          ┆              ┆   ┆         ┆       ┆        ┆        │
╞═══════════════╪════════════╪══════════════╪══════════════╪═══╪═════════╪═══════╪════════╪════════╡
│ ../README.md  ┆ 6413       ┆ 2025-09-20   ┆ 2025-09-20   ┆ … ┆ 7601600 ┆ 1     ┆ null   ┆ {}     │
│               ┆            ┆ 00:24:52     ┆ 00:24:52     ┆   ┆         ┆       ┆        ┆        │
│ ../pyproject. ┆ 2113       ┆ 2025-09-20   ┆ 2025-09-20   ┆ … ┆ 7579961 ┆ 1     ┆ null   ┆ {}     │
│ toml          ┆            ┆ 21:39:44     ┆ 21:39:44     ┆   ┆         ┆       ┆        ┆        │
│ ../Cargo.toml ┆ 481        ┆ 2025-08-30   ┆ 2025-08-30   ┆ … ┆ 7579934 ┆ 1     ┆ null   ┆ {}     │
│               ┆            ┆ 20:14:29     ┆ 20:14:29     ┆   ┆         ┆       ┆        ┆        │
└───────────────┴────────────┴──────────────┴──────────────┴───┴─────────┴───────┴────────┴────────┘
Add depth column relative to repo root:
filoma.DataFrame with 3 rows
shape: (3, 2)
┌───────────────────┬───────┐
│ path              ┆ depth │
│ ---               ┆ ---   │
│ str               ┆ i64   │
╞═══════════════════╪═══════╡
│ ../README.md      ┆ 2     │
│ ../pyproject.toml ┆ 2     │
│ ../Cargo.toml     ┆ 2     │
└───────────────────┴───────┘

3) Build a DataFrame from a directory using probe_to_df¶

This uses filoma's convenience probe_to_df which returns a filoma.DataFrame wrapper (Polars is used internally if available). We request a lightweight folder under tests/ to keep runtime small.

In [4]:
from filoma import probe_to_df

dir_path = "../tests"
if dir_path is None:
    print("No test directory available for probe_to_df; skip this cell.")
else:
    try:
        dfw = probe_to_df(dir_path, to_pandas=False, enrich=True, max_depth=2, threads=2)
        print("probe_to_df returned a filoma.DataFrame with shape:", dfw.shape)
        # Show a small sample and a group_by_extension summary when available
        try:
            print("Sample rows:")
            print(dfw.head(5))
        except Exception:
            pass
        try:
            print("Extension counts:")
            print(dfw.group_by_extension().head(10))
        except Exception as e:
            print("group_by_extension failed:", e)
    except Exception as e:
        print("probe_to_df failed:", e)
2025-09-20 21:43:20.530 | DEBUG    | filoma.directories.directory_profiler:__init__:352 - Interactive environment detected, disabling progress bars to avoid conflicts
2025-09-20 21:43:20.531 | INFO     | filoma.directories.directory_profiler:probe:439 - Starting directory analysis of '../tests' using 🦀 Rust (Parallel) implementation
2025-09-20 21:43:20.534 | SUCCESS  | filoma.directories.directory_profiler:probe:455 - Directory analysis completed in 0.00s - Found 179 items (154 files, 25 folders) using 🦀 Rust (Parallel)
2025-09-20 21:43:20.531 | INFO     | filoma.directories.directory_profiler:probe:439 - Starting directory analysis of '../tests' using 🦀 Rust (Parallel) implementation
2025-09-20 21:43:20.534 | SUCCESS  | filoma.directories.directory_profiler:probe:455 - Directory analysis completed in 0.00s - Found 179 items (154 files, 25 folders) using 🦀 Rust (Parallel)
probe_to_df returned a filoma.DataFrame with shape: (146, 18)
Sample rows:
filoma.DataFrame with 5 rows
shape: (5, 18)
┌───────────────────┬───────┬──────────┬──────────────────┬───┬──────────┬───────┬────────┬────────┐
│ path              ┆ depth ┆ parent   ┆ name             ┆ … ┆ inode    ┆ nlink ┆ sha256 ┆ xattrs │
│ ---               ┆ ---   ┆ ---      ┆ ---              ┆   ┆ ---      ┆ ---   ┆ ---    ┆ ---    │
│ str               ┆ i64   ┆ str      ┆ str              ┆   ┆ i64      ┆ i64   ┆ str    ┆ str    │
╞═══════════════════╪═══════╪══════════╪══════════════════╪═══╪══════════╪═══════╪════════╪════════╡
│ ../tests/test_asy ┆ 1     ┆ ../tests ┆ test_async_rust_ ┆ … ┆ 7601345  ┆ 1     ┆ null   ┆ {}     │
│ nc_rust_extra…    ┆       ┆          ┆ extra.py         ┆   ┆          ┆       ┆        ┆        │
│ ../tests/test_bas ┆ 1     ┆ ../tests ┆ test_basic_dataf ┆ … ┆ 7602664  ┆ 1     ┆ null   ┆ {}     │
│ ic_dataframe.…    ┆       ┆          ┆ rame.py          ┆   ┆          ┆       ┆        ┆        │
│ ../tests/scripts  ┆ 1     ┆ ../tests ┆ scripts          ┆ … ┆ 13593175 ┆ 3     ┆ null   ┆ {}     │
│ ../tests/test_ml_ ┆ 1     ┆ ../tests ┆ test_ml_core.py  ┆ … ┆ 7602966  ┆ 1     ┆ null   ┆ {}     │
│ core.py           ┆       ┆          ┆                  ┆   ┆          ┆       ┆        ┆        │
│ ../tests/test_rus ┆ 1     ┆ ../tests ┆ test_rust_absolu ┆ … ┆ 7601819  ┆ 1     ┆ null   ┆ {}     │
│ t_absolute_pa…    ┆       ┆          ┆ te_paths.py      ┆   ┆          ┆       ┆        ┆        │
└───────────────────┴───────┴──────────┴──────────────────┴───┴──────────┴───────┴────────┴────────┘
Extension counts:
filoma.DataFrame with 3 rows
shape: (3, 2)
┌────────────────┬─────┐
│ extension      ┆ len │
│ ---            ┆ --- │
│ str            ┆ u32 │
╞════════════════╪═════╡
│ .pyc           ┆ 85  │
│ .py            ┆ 37  │
│ <no extension> ┆ 24  │
└────────────────┴─────┘

4) Image probing (in-memory)¶

Create a small numpy array and pass it to filoma.probe_image to exercise the image path that accepts arrays. This avoids needing image files or heavy dependencies.

In [5]:
try:
    import numpy as np

    arr = np.random.randn(16, 16)
    img_report = filoma.probe_image(arr)
    print("probe_image on numpy array returned type:", type(img_report))
    try:
        print(img_report)
    except Exception:
        pass
except Exception as e:
    print("Skipping image probe; numpy unavailable or probe failed:", e)
probe_image on numpy array returned type: <class 'filoma.images.image_profiler.ImageReport'>
ImageReport(path=None, file_type=None, shape=(16, 16), dtype='float64', min=-2.8289461122786097, max=2.6011147464964393, mean=0.034071740265040014, nans=0, infs=0, unique=256, status=None)

5) Save a small CSV export (if polars is available)¶

This cell attempts to save the probe_to_df result or our small DataFrame example to /tmp/filoma_example.csv. It prints a short verification sample.

In [6]:
out_path = Path("/tmp/filoma_example.csv")
saved = False
try:
    if "dfw" in globals():
        try:
            dfw.df.write_csv(str(out_path))
            saved = True
        except Exception:
            pass
    if saved:
        print("Saved CSV to", out_path)
        try:
            print("CSV sample:", out_path.read_text().splitlines()[:10])
        except Exception:
            pass
    else:
        print("Could not save CSV; polars or file-writer not available.")
except Exception as e:
    print("Saving CSV failed:", e)
Saved CSV to /tmp/filoma_example.csv
CSV sample: ['path,depth,parent,name,stem,suffix,size_bytes,modified_time,created_time,is_file,is_dir,owner,group,mode_str,inode,nlink,sha256,xattrs', '../tests/test_async_rust_extra.py,1,../tests,test_async_rust_extra.py,test_async_rust_extra,.py,1938,2025-09-10 23:02:11,2025-09-10 23:02:11,true,false,kalfasy,kalfasy,-rw-rw-r--,7601345,1,,{}', '../tests/test_basic_dataframe.py,1,../tests,test_basic_dataframe.py,test_basic_dataframe,.py,1413,2025-09-10 23:02:12,2025-09-10 23:02:12,true,false,kalfasy,kalfasy,-rw-rw-r--,7602664,1,,{}', '../tests/scripts,1,../tests,scripts,scripts,"",4096,2025-09-04 20:14:19,2025-09-04 20:14:19,false,true,kalfasy,kalfasy,drwxrwxr-x,13593175,3,,{}', '../tests/test_ml_core.py,1,../tests,test_ml_core.py,test_ml_core,.py,5741,2025-09-20 21:41:56,2025-09-20 21:41:56,true,false,kalfasy,kalfasy,-rw-rw-r--,7602966,1,,{}', '../tests/test_rust_absolute_paths.py,1,../tests,test_rust_absolute_paths.py,test_rust_absolute_paths,.py,2589,2025-09-06 16:48:42,2025-09-06 16:48:42,true,false,kalfasy,kalfasy,-rw-rw-r--,7601819,1,,{}', '../tests/test_dataframe.py,1,../tests,test_dataframe.py,test_dataframe,.py,12731,2025-09-10 22:42:30,2025-09-10 22:42:30,true,false,kalfasy,kalfasy,-rw-rw-r--,7601601,1,,{}', '../tests/test_dataframe_chaining.py,1,../tests,test_dataframe_chaining.py,test_dataframe_chaining,.py,1142,2025-09-07 21:13:30,2025-09-07 21:13:30,true,false,kalfasy,kalfasy,-rw-rw-r--,7603121,1,,{}', '../tests/test_ml_path_col.py,1,../tests,test_ml_path_col.py,test_ml_path_col,.py,1010,2025-09-20 16:04:09,2025-09-20 16:04:09,true,false,kalfasy,kalfasy,-rw-rw-r--,7602942,1,,{}', '../tests/test_rust_comprehensive.py,1,../tests,test_rust_comprehensive.py,test_rust_comprehensive,.py,12431,2025-09-10 23:02:11,2025-09-10 23:02:11,true,false,kalfasy,kalfasy,-rw-rw-r--,7601605,1,,{}']

Notes and next steps¶

  • If a cell raised an exception because a dependency is missing, install polars, numpy, and optionally pillow.
  • To run longer scans increase max_depth and threads in the probe() calls.
  • Use probe_to_df(..., to_pandas=True) to get a pandas.DataFrame if you prefer pandas.