Metadata-Version: 2.4
Name: haphazard
Version: 0.1.1
Summary: A modular framework for registering and running haphazard datasets and models.
Home-page: https://github.com/theArijitDas/Haphazard-Package/
Author: Arijit Das
Author-email: dasarijitjnv@gmail.com
License: MIT
Project-URL: Bug Tracker, https://github.com/theArijitDas/Haphazard-Package/issues
Project-URL: Source Code, https://github.com/theArijitDas/Haphazard-Package/
Keywords: machine-learning haphazard models datasets registration framework
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: tqdm
Requires-Dist: scikit-learn
Requires-Dist: torch
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: project-url
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Haphazard

A Python package for **haphazard dataset and model management**.  
Provides a standardized interface for loading datasets and models, running experiments, and extending with custom datasets or models.

---

## Table of Contents

- [Installation](#installation)
- [Quick Start](#quick-start)
- [Datasets](#datasets)
- [Models](#models)
- [Contributing](#contributing)
- [License](#license)

---

## Installation

Install via pip (after packaging):

```bash
pip install haphazard
````

Or for local development:

```bash
git clone <repo_url>
cd haphazard
pip install -e .
```

---

## Project Structure

The Haphazard package has a modular layout:

```
haphazard/
â”œâ”€â”€ __init__.py                    # Top-level package
â”œâ”€â”€ data/                          # Dataset related modules
â”‚   â”œâ”€â”€ __init__.py
â”‚   â”œâ”€â”€ base_dataset.py            # Abstract BaseDataset class
â”‚   â””â”€â”€ datasets/                  # All dataset implementations
â”‚       â”œâ”€â”€ __init__.py
â”‚       â””â”€â”€ dummy_dataset/
â”‚           â””â”€â”€ __init__.py
â”œâ”€â”€ models/                        # Model related modules
â”‚   â”œâ”€â”€ __init__.py
â”‚   â”œâ”€â”€ base_model.py              # Abstract BaseModel class
â”‚   â””â”€â”€ model_zoo/                 # All model implementations
â”‚       â”œâ”€â”€ __init__.py
â”‚       â””â”€â”€ dummy_model/
â”‚           â””â”€â”€ __init__.py
â””â”€â”€ utils/                         # Optional helper functions
    â””â”€â”€ ...
```

**Notes:**

* `data/base_dataset.py` defines `BaseDataset`.
* `data/datasets/` contains registered datasets; each dataset is a submodule with `__init__.py`.
* `models/base_model.py` defines `BaseModel`.
* `models/model_zoo/` contains registered models; each model is a submodule with `__init__.py`.
* `utils/` is optional, for shared helpers.

This layout allows **dynamic registration** of datasets and models via decorators.

---


## Quick Start

```python
from haphazard import load_dataset, load_model

# Load dataset
dataset = load_dataset("dummy", n_samples=100, n_features=10)

# Load model
model = load_model("dummy")

# Run model
outputs = model(dataset)
print(outputs)
```

---

## Datasets

* All datasets must inherit from `BaseDataset`.
* Example dataset: `DummyDataset`.
* Main interface:

```python
from haphazard import load_dataset

dataset = load_dataset("dummy", base_path="./data")
x, y = dataset.load_data()
mask = dataset.load_mask(scheme="probabilistic", availability_prob=0.5)
```

### Dataset Attributes

* `name` : str â€” Dataset name.
* `task` : `"classification"` | `"regression"`.
* `haphazard_type` : `"controlled"` | `"intrinsic"`.
* `n_samples`, `n_features` : int.
* `num_classes` : int (for classification).

---

## Models

* All models must inherit from `BaseModel`.
* Example model: `DummyModel`.
* Main interface:

```python
from haphazard import load_model

model = load_model("dummy")
outputs = model(dataset)
```

### Output

* **Classification**: `labels`, `preds`, `logits`, `time_taken`, `is_logit`.
* **Regression**: `targets`, `preds`, `time_taken`.

---

## Contributing

Haphazard is designed for **easy extensibility**. You can add new datasets and models.

### Adding a new dataset

1. Create a new folder under `haphazard/data/datasets/`, e.g., `my_dataset/`.
2. Add `__init__.py`:

```python
from ...base_dataset import BaseDataset
from ...datasets import register_dataset
import numpy as np

@register_dataset("my_dataset")
class MyDataset(BaseDataset):
    def __init__(self, base_path="./", **kwargs):
        self.name = "my_dataset"
        self.haphazard_type = "controlled"
        self.task = "classification"
        super().__init__(base_path=base_path, **kwargs)

    def read_data(self, base_path="./"):
        # Load or generate x, y
        x = np.random.random((100, 10))
        y = np.random.randint(0, 2, 100)
        return x, y
```

3. The dataset is automatically registered and can be loaded with `load_dataset("my_dataset")`.

### Adding a new model

1. Create a new folder under `haphazard/models/model_zoo/`, e.g., `my_model/`.
2. Add `__init__.py`:

```python
from ...base_model import BaseModel, BaseDataset
from ...model_zoo import register_model
import numpy as np

@register_model("my_model")
class MyModel(BaseModel):
    def __init__(self, **kwargs):
        self.name = "MyModel"
        self.tasks = {"classification", "regression"}
        self.deterministic = True
        self.hyperparameters = set()
        super().__init__(**kwargs)

    def fit(self, dataset: BaseDataset, mask_params=None, model_params=None, seed=42):
        # Dummy implementation
        x, y = dataset.load_data()
        mask = dataset.load_mask(**mask_params)
        preds = np.random.randint(0, 2, size=y.shape[0])
        if dataset.task == "classification":
            return {
                "labels": y,
                "preds": preds,
                "logits": preds.astype(float),
                "time_taken": 0.0,
                "is_logit": True
            }
        elif dataset.task == "regression":
            return {
                "targets": y,
                "preds": preds,
                "time_taken": 0.0,
            }
```

3. The model is automatically registered and can be loaded with `load_model("my_model")`.

---

## License

MIT License.
