filoma — Quick interactive examples¶
This notebook demonstrates key filoma capabilities and includes lightweight checks to see if it works in your environment.
It covers: imports and version checks, probing a file and a directory, working with the filoma.DataFrame wrapper, using probe_to_df, a small image probe example, and saving a CSV export.
Note: cells wrap operations in try/except so the notebook still runs if optional dependencies (e.g. polars, numpy, or image backends) are missing.
# Basic environment and import checks
from pathlib import Path
import filoma
from filoma import DataFrame
def check_imports():
results = {}
try:
import filoma
results["filoma"] = getattr(filoma, "__version__", "unknown")
except Exception as e:
results["filoma"] = f"IMPORT ERROR: {e}"
for pkg in ("polars", "numpy", "PIL"):
try:
__import__(pkg if pkg != "PIL" else "PIL.Image")
results[pkg] = "available"
except Exception as e:
results[pkg] = f"missing ({e})"
# show where we are running the notebook from
results["cwd"] = str(Path(".").resolve())
return results
check_imports()
{'filoma': '1.7.6',
'polars': 'available',
'numpy': 'available',
'PIL': 'available',
'cwd': '/home/kalfasy/repos/filoma/notebooks'}
1) Quick probe: a single file and a directory¶
Try probing a README or small file, then probe a lightweight sample directory from the repo's tests/ tree.
file_candidate = "../README.md"
dir_candidate = "../tests/"
print("probing file ->", file_candidate)
if file_candidate is not None:
try:
file_report = filoma.probe(file_candidate)
print("file probe result type:", type(file_report))
try:
# many filoma dataclasses implement a nice repr or to-dict
print(file_report)
except Exception:
pass
except Exception as e:
print("file probe failed:", e)
else:
print("No small file found to probe in the repository root.")
print("probing directory ->", dir_candidate)
if dir_candidate is not None:
try:
dir_report = filoma.probe(dir_candidate, max_depth=2, threads=2)
print("directory probe returned an object of type:", type(dir_report))
# If it exposes a to_df() method we can inspect a little
if hasattr(dir_report, "to_df"):
try:
dfw = dir_report.to_df()
print("to_df() -> wrapper type:", type(dfw))
except Exception as e:
print("to_df() raised:", e)
except Exception as e:
print("directory probe failed:", e)
else:
print("No small directory found to probe in tests/; adjust the path and re-run.")
2025-09-20 21:43:20.513 | DEBUG | filoma.directories.directory_profiler:__init__:352 - Interactive environment detected, disabling progress bars to avoid conflicts 2025-09-20 21:43:20.513 | INFO | filoma.directories.directory_profiler:probe:439 - Starting directory analysis of '../tests/' using 🦀 Rust (Parallel) implementation 2025-09-20 21:43:20.515 | SUCCESS | filoma.directories.directory_profiler:probe:455 - Directory analysis completed in 0.00s - Found 179 items (154 files, 25 folders) using 🦀 Rust (Parallel) 2025-09-20 21:43:20.515 | WARNING | filoma.directories.directory_profiler:to_df:193 - No DataFrame available for analysis at path /home/kalfasy/repos/filoma/tests. DataFrame building is disabled by default or 'polars' is not installed. Call DirectoryProfiler(build_dataframe=True) or use filoma.probe_to_df(...) to obtain a DataFrame. 2025-09-20 21:43:20.513 | INFO | filoma.directories.directory_profiler:probe:439 - Starting directory analysis of '../tests/' using 🦀 Rust (Parallel) implementation 2025-09-20 21:43:20.515 | SUCCESS | filoma.directories.directory_profiler:probe:455 - Directory analysis completed in 0.00s - Found 179 items (154 files, 25 folders) using 🦀 Rust (Parallel) 2025-09-20 21:43:20.515 | WARNING | filoma.directories.directory_profiler:to_df:193 - No DataFrame available for analysis at path /home/kalfasy/repos/filoma/tests. DataFrame building is disabled by default or 'polars' is not installed. Call DirectoryProfiler(build_dataframe=True) or use filoma.probe_to_df(...) to obtain a DataFrame.
probing file -> ../README.md
file probe result type: <class 'filoma.files.file_profiler.Filo'>
Filo(path=PosixPath('/home/kalfasy/repos/filoma/README.md'), size=6413, mode='0o100664', mode_str='-rw-rw-r--', owner='kalfasy', group='kalfasy', created=datetime.datetime(2025, 9, 20, 0, 24, 52), modified=datetime.datetime(2025, 9, 20, 0, 24, 52), accessed=datetime.datetime(2025, 9, 20, 0, 24, 52), is_symlink=False, is_file=True, is_dir=False, target_is_file=None, target_is_dir=None, rights={'read': True, 'write': True, 'execute': False}, inode=7601600, nlink=1, sha256=None, xattrs={})
probing directory -> ../tests/
directory probe returned an object of type: <class 'filoma.directories.directory_profiler.DirectoryAnalysis'>
to_df() -> wrapper type: <class 'NoneType'>
2) Working with filoma.DataFrame wrapper¶
Construct a filoma.DataFrame from a list of paths and run the convenience enrichers: .add_path_components(), .add_file_stats_cols(), and .add_depth_col().
sample_paths = [p for p in (Path("../README.md"), Path("../pyproject.toml"), Path("../Cargo.toml")) if p.exists()]
if not sample_paths:
# fallback to a couple of files from tests if present
sample_paths = [p for p in (Path("../tests/test_basic_dataframe.py"), Path("../tests/test_rust_comprehensive.py")) if p.exists()]
print("sample paths used:", sample_paths)
dfw = DataFrame(sample_paths)
print("Initial wrapper and head:")
print(dfw.head(10))
print("With path components:")
try:
df_components = dfw.add_path_components()
print(df_components.head(10))
except Exception as e:
print("add_path_components failed:", e)
print("With file stats:")
try:
df_stats = dfw.add_file_stats_cols()
print(df_stats.head(10))
except Exception as e:
print("add_file_stats_cols failed:", e)
print("Add depth column relative to repo root:")
try:
df_depth = dfw.add_depth_col(Path("."))
print(df_depth.head(10))
except Exception as e:
print("add_depth_col failed:", e)
sample paths used: [PosixPath('../README.md'), PosixPath('../pyproject.toml'), PosixPath('../Cargo.toml')]
Initial wrapper and head:
filoma.DataFrame with 3 rows
shape: (3, 1)
┌───────────────────┐
│ path │
│ --- │
│ str │
╞═══════════════════╡
│ ../README.md │
│ ../pyproject.toml │
│ ../Cargo.toml │
└───────────────────┘
With path components:
filoma.DataFrame with 3 rows
shape: (3, 5)
┌───────────────────┬────────┬────────────────┬───────────┬────────┐
│ path ┆ parent ┆ name ┆ stem ┆ suffix │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str │
╞═══════════════════╪════════╪════════════════╪═══════════╪════════╡
│ ../README.md ┆ .. ┆ README.md ┆ README ┆ .md │
│ ../pyproject.toml ┆ .. ┆ pyproject.toml ┆ pyproject ┆ .toml │
│ ../Cargo.toml ┆ .. ┆ Cargo.toml ┆ Cargo ┆ .toml │
└───────────────────┴────────┴────────────────┴───────────┴────────┘
With file stats:
filoma.DataFrame with 3 rows
shape: (3, 13)
┌───────────────┬────────────┬──────────────┬──────────────┬───┬─────────┬───────┬────────┬────────┐
│ path ┆ size_bytes ┆ modified_tim ┆ created_time ┆ … ┆ inode ┆ nlink ┆ sha256 ┆ xattrs │
│ --- ┆ --- ┆ e ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ --- ┆ str ┆ ┆ i64 ┆ i64 ┆ str ┆ str │
│ ┆ ┆ str ┆ ┆ ┆ ┆ ┆ ┆ │
╞═══════════════╪════════════╪══════════════╪══════════════╪═══╪═════════╪═══════╪════════╪════════╡
│ ../README.md ┆ 6413 ┆ 2025-09-20 ┆ 2025-09-20 ┆ … ┆ 7601600 ┆ 1 ┆ null ┆ {} │
│ ┆ ┆ 00:24:52 ┆ 00:24:52 ┆ ┆ ┆ ┆ ┆ │
│ ../pyproject. ┆ 2113 ┆ 2025-09-20 ┆ 2025-09-20 ┆ … ┆ 7579961 ┆ 1 ┆ null ┆ {} │
│ toml ┆ ┆ 21:39:44 ┆ 21:39:44 ┆ ┆ ┆ ┆ ┆ │
│ ../Cargo.toml ┆ 481 ┆ 2025-08-30 ┆ 2025-08-30 ┆ … ┆ 7579934 ┆ 1 ┆ null ┆ {} │
│ ┆ ┆ 20:14:29 ┆ 20:14:29 ┆ ┆ ┆ ┆ ┆ │
└───────────────┴────────────┴──────────────┴──────────────┴───┴─────────┴───────┴────────┴────────┘
Add depth column relative to repo root:
filoma.DataFrame with 3 rows
shape: (3, 2)
┌───────────────────┬───────┐
│ path ┆ depth │
│ --- ┆ --- │
│ str ┆ i64 │
╞═══════════════════╪═══════╡
│ ../README.md ┆ 2 │
│ ../pyproject.toml ┆ 2 │
│ ../Cargo.toml ┆ 2 │
└───────────────────┴───────┘
3) Build a DataFrame from a directory using probe_to_df¶
This uses filoma's convenience probe_to_df which returns a filoma.DataFrame wrapper (Polars is used internally if available). We request a lightweight folder under tests/ to keep runtime small.
from filoma import probe_to_df
dir_path = "../tests"
if dir_path is None:
print("No test directory available for probe_to_df; skip this cell.")
else:
try:
dfw = probe_to_df(dir_path, to_pandas=False, enrich=True, max_depth=2, threads=2)
print("probe_to_df returned a filoma.DataFrame with shape:", dfw.shape)
# Show a small sample and a group_by_extension summary when available
try:
print("Sample rows:")
print(dfw.head(5))
except Exception:
pass
try:
print("Extension counts:")
print(dfw.group_by_extension().head(10))
except Exception as e:
print("group_by_extension failed:", e)
except Exception as e:
print("probe_to_df failed:", e)
2025-09-20 21:43:20.530 | DEBUG | filoma.directories.directory_profiler:__init__:352 - Interactive environment detected, disabling progress bars to avoid conflicts 2025-09-20 21:43:20.531 | INFO | filoma.directories.directory_profiler:probe:439 - Starting directory analysis of '../tests' using 🦀 Rust (Parallel) implementation 2025-09-20 21:43:20.534 | SUCCESS | filoma.directories.directory_profiler:probe:455 - Directory analysis completed in 0.00s - Found 179 items (154 files, 25 folders) using 🦀 Rust (Parallel) 2025-09-20 21:43:20.531 | INFO | filoma.directories.directory_profiler:probe:439 - Starting directory analysis of '../tests' using 🦀 Rust (Parallel) implementation 2025-09-20 21:43:20.534 | SUCCESS | filoma.directories.directory_profiler:probe:455 - Directory analysis completed in 0.00s - Found 179 items (154 files, 25 folders) using 🦀 Rust (Parallel)
probe_to_df returned a filoma.DataFrame with shape: (146, 18)
Sample rows:
filoma.DataFrame with 5 rows
shape: (5, 18)
┌───────────────────┬───────┬──────────┬──────────────────┬───┬──────────┬───────┬────────┬────────┐
│ path ┆ depth ┆ parent ┆ name ┆ … ┆ inode ┆ nlink ┆ sha256 ┆ xattrs │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ str ┆ ┆ i64 ┆ i64 ┆ str ┆ str │
╞═══════════════════╪═══════╪══════════╪══════════════════╪═══╪══════════╪═══════╪════════╪════════╡
│ ../tests/test_asy ┆ 1 ┆ ../tests ┆ test_async_rust_ ┆ … ┆ 7601345 ┆ 1 ┆ null ┆ {} │
│ nc_rust_extra… ┆ ┆ ┆ extra.py ┆ ┆ ┆ ┆ ┆ │
│ ../tests/test_bas ┆ 1 ┆ ../tests ┆ test_basic_dataf ┆ … ┆ 7602664 ┆ 1 ┆ null ┆ {} │
│ ic_dataframe.… ┆ ┆ ┆ rame.py ┆ ┆ ┆ ┆ ┆ │
│ ../tests/scripts ┆ 1 ┆ ../tests ┆ scripts ┆ … ┆ 13593175 ┆ 3 ┆ null ┆ {} │
│ ../tests/test_ml_ ┆ 1 ┆ ../tests ┆ test_ml_core.py ┆ … ┆ 7602966 ┆ 1 ┆ null ┆ {} │
│ core.py ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ │
│ ../tests/test_rus ┆ 1 ┆ ../tests ┆ test_rust_absolu ┆ … ┆ 7601819 ┆ 1 ┆ null ┆ {} │
│ t_absolute_pa… ┆ ┆ ┆ te_paths.py ┆ ┆ ┆ ┆ ┆ │
└───────────────────┴───────┴──────────┴──────────────────┴───┴──────────┴───────┴────────┴────────┘
Extension counts:
filoma.DataFrame with 3 rows
shape: (3, 2)
┌────────────────┬─────┐
│ extension ┆ len │
│ --- ┆ --- │
│ str ┆ u32 │
╞════════════════╪═════╡
│ .pyc ┆ 85 │
│ .py ┆ 37 │
│ <no extension> ┆ 24 │
└────────────────┴─────┘
4) Image probing (in-memory)¶
Create a small numpy array and pass it to filoma.probe_image to exercise the image path that accepts arrays. This avoids needing image files or heavy dependencies.
try:
import numpy as np
arr = np.random.randn(16, 16)
img_report = filoma.probe_image(arr)
print("probe_image on numpy array returned type:", type(img_report))
try:
print(img_report)
except Exception:
pass
except Exception as e:
print("Skipping image probe; numpy unavailable or probe failed:", e)
probe_image on numpy array returned type: <class 'filoma.images.image_profiler.ImageReport'> ImageReport(path=None, file_type=None, shape=(16, 16), dtype='float64', min=-2.8289461122786097, max=2.6011147464964393, mean=0.034071740265040014, nans=0, infs=0, unique=256, status=None)
5) Save a small CSV export (if polars is available)¶
This cell attempts to save the probe_to_df result or our small DataFrame example to /tmp/filoma_example.csv. It prints a short verification sample.
out_path = Path("/tmp/filoma_example.csv")
saved = False
try:
if "dfw" in globals():
try:
dfw.df.write_csv(str(out_path))
saved = True
except Exception:
pass
if saved:
print("Saved CSV to", out_path)
try:
print("CSV sample:", out_path.read_text().splitlines()[:10])
except Exception:
pass
else:
print("Could not save CSV; polars or file-writer not available.")
except Exception as e:
print("Saving CSV failed:", e)
Saved CSV to /tmp/filoma_example.csv
CSV sample: ['path,depth,parent,name,stem,suffix,size_bytes,modified_time,created_time,is_file,is_dir,owner,group,mode_str,inode,nlink,sha256,xattrs', '../tests/test_async_rust_extra.py,1,../tests,test_async_rust_extra.py,test_async_rust_extra,.py,1938,2025-09-10 23:02:11,2025-09-10 23:02:11,true,false,kalfasy,kalfasy,-rw-rw-r--,7601345,1,,{}', '../tests/test_basic_dataframe.py,1,../tests,test_basic_dataframe.py,test_basic_dataframe,.py,1413,2025-09-10 23:02:12,2025-09-10 23:02:12,true,false,kalfasy,kalfasy,-rw-rw-r--,7602664,1,,{}', '../tests/scripts,1,../tests,scripts,scripts,"",4096,2025-09-04 20:14:19,2025-09-04 20:14:19,false,true,kalfasy,kalfasy,drwxrwxr-x,13593175,3,,{}', '../tests/test_ml_core.py,1,../tests,test_ml_core.py,test_ml_core,.py,5741,2025-09-20 21:41:56,2025-09-20 21:41:56,true,false,kalfasy,kalfasy,-rw-rw-r--,7602966,1,,{}', '../tests/test_rust_absolute_paths.py,1,../tests,test_rust_absolute_paths.py,test_rust_absolute_paths,.py,2589,2025-09-06 16:48:42,2025-09-06 16:48:42,true,false,kalfasy,kalfasy,-rw-rw-r--,7601819,1,,{}', '../tests/test_dataframe.py,1,../tests,test_dataframe.py,test_dataframe,.py,12731,2025-09-10 22:42:30,2025-09-10 22:42:30,true,false,kalfasy,kalfasy,-rw-rw-r--,7601601,1,,{}', '../tests/test_dataframe_chaining.py,1,../tests,test_dataframe_chaining.py,test_dataframe_chaining,.py,1142,2025-09-07 21:13:30,2025-09-07 21:13:30,true,false,kalfasy,kalfasy,-rw-rw-r--,7603121,1,,{}', '../tests/test_ml_path_col.py,1,../tests,test_ml_path_col.py,test_ml_path_col,.py,1010,2025-09-20 16:04:09,2025-09-20 16:04:09,true,false,kalfasy,kalfasy,-rw-rw-r--,7602942,1,,{}', '../tests/test_rust_comprehensive.py,1,../tests,test_rust_comprehensive.py,test_rust_comprehensive,.py,12431,2025-09-10 23:02:11,2025-09-10 23:02:11,true,false,kalfasy,kalfasy,-rw-rw-r--,7601605,1,,{}']
Notes and next steps¶
- If a cell raised an exception because a dependency is missing, install
polars,numpy, and optionallypillow. - To run longer scans increase
max_depthandthreadsin theprobe()calls. - Use
probe_to_df(..., to_pandas=True)to get a pandas.DataFrame if you prefer pandas.