Metadata-Version: 2.4
Name: sbcluster
Version: 0.2.1
Summary: Spectral Bridges clustering algorithm
Author: Félix Laplante
Project-URL: Source, https://gitlab.com/felixlaplante0/sbcluster
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: scikit-learn
Requires-Dist: faiss-cpu
Dynamic: license-file

# 📊 Spectral Bridges

**sbcluster** is a Python package that implements a novel clustering algorithm combining k-means and spectral clustering techniques, called **Spectral Bridges**. It leverages efficient affinity matrix computation and merges clusters based on a connectivity measure inspired by SVM's margin concept. This package is designed to provide robust clustering solutions, particularly suited for large datasets.

---

## ✨ Features

- **Spectral Bridges Algorithm**: Integrates k-means and spectral clustering with efficient affinity matrix calculation for improved clustering results.
- **Scalability**: Designed to handle large datasets by optimizing cluster formation through advanced affinity matrix computations.
- **Customizable**: Parameters such as number of clusters, iterations, and random state allow flexibility in clustering configurations.
- **Model selection**: Automatic model selection for number of nodes (m) according to a normalized eigengap metric.
- **scikit-learn**: Native integration with the standard API, with easy options for model selection and evaluation.

---

## ⚡ Speed

Spectral Bridges not only utilizes FAISS's efficient k-means implementation but also uses a scikit-learn method clone for centroid initialization, which is much faster than using scikit-learn's implementation (over 2x improvement).

---

## 🚀 Installation

```bash
pip install sbcluster
```

## 🔧 Usage

### Example

```python
import numpy as np
from sbcluster import SpectralBridges, ngap_scorer
from sklearn.metrics import adjusted_rand_score
from sklearn.model_selection import GridSearchCV

# Load some synthetic data
data = np.genfromtxt("datasets/impossible.csv", delimiter=",")
X, y = data[:, :-1], data[:, -1]

# Define the parameter grid
param_grid = {"n_clusters": [2, 3, 4, 5, 6, 7, 8, 9, 10]}
cv = [(np.arange(X.shape[0]), np.arange(X.shape[0]))] * 5

# Perform grid search for optimal parameters
grid_search = GridSearchCV(
    estimator=SpectralBridges(n_clusters=2, n_nodes=250),
    param_grid=param_grid,
    scoring=ngap_scorer,
    cv=cv,
    verbose=1,
)

# Fit the grid search
grid_search.fit(X)

# Print the results
print(grid_search.cv_results_["mean_test_score"])
print(grid_search.best_params_)

# Make predictions with the best model
guess = grid_search.best_estimator_.predict(X)
ari = adjusted_rand_score(y, guess)

# Print the ARI
print(f"Adjusted Rand Index: {ari}")
```

---

## 📖 Learn More

For tutorials, API reference, visit the official site:  
👉 [sbcluster Documentation](https://felixlaplante0.gitlab.io/sbcluster)
