Metadata-Version: 2.4
Name: ibbi
Version: 0.2.2b5
Summary: A package for bark and ambrosia beetle identification.
License: MIT
License-File: LICENSE.md
Author: G. Christopher Marais
Requires-Python: >=3.11,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: datasets (>=3.6.0,<4.0.0)
Requires-Dist: hf-xet (>=1.1.3,<2.0.0)
Requires-Dist: huggingface-hub (>=0.34.4,<0.35.0)
Requires-Dist: ipywidgets (>=8.1.7,<9.0.0)
Requires-Dist: lime (>=0.2.0.1,<0.3.0.0)
Requires-Dist: numpy (>=2.2.6,<3.0.0)
Requires-Dist: pandas (>=2.3.0,<3.0.0)
Requires-Dist: pillow (>=11.2.1,<12.0.0)
Requires-Dist: scikit-bio (>=0.7.0,<0.8.0)
Requires-Dist: shap (>=0.48.0,<0.49.0)
Requires-Dist: slicer (>=0.0.8,<0.0.9)
Requires-Dist: tabulate (>=0.9.0,<0.10.0)
Requires-Dist: timm (>=1.0.19,<2.0.0)
Requires-Dist: torch (>=2.1)
Requires-Dist: torchvision (>=0.16.0)
Requires-Dist: transformers (>=4.56.0,<5.0.0)
Requires-Dist: ultralytics (==8.3.139)
Requires-Dist: umap-learn (>=0.5.9.post2,<0.6.0)
Project-URL: Documentation, https://gcmarais.com/IBBI/
Project-URL: Repository, https://github.com/ChristopherMarais/IBBI
Description-Content-Type: text/markdown

# Intelligent Bark Beetle Identifier (IBBI)

[![PyPI version](https://badge.fury.io/py/ibbi.svg)](https://badge.fury.io/py/ibbi)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Documentation](https://img.shields.io/badge/docs-latest-blue.svg)](https://gcmarais.com/IBBI/)

**IBBI** is a Python package that provides a simple and unified interface for detecting and classifying bark and ambrosia beetles from images using state-of-the-art computer vision models.

This package is designed to support entomological research by automating the laborious task of beetle identification, enabling high-throughput data analysis for ecological studies, pest management, and biodiversity monitoring.

### Motivation

The ability to accurately detect and identify bark and ambrosia beetles is critical for forest health and pest management. However, traditional methods face significant challenges:

* **They are slow and time-consuming.**
* **They require highly specialized expertise.**
* **They create a bottleneck for large-scale research.**

The IBBI package provides a powerful, modern solution to overcome these obstacles by making available pre-trained, open-source models to automate detection and classification from images, lowering the barrier to entry for researchers. The IBBI package also makes other advanced techniques available to the user such as general zero-shot detection, general feature extraction, model evaluation, and model explainability.

### Key Features

* **Model Access:**
<br>Access powerful models with a single function call `ibbi.create_model()`. The following types of models are available:

  * **Single-Class Bark Beetle Detection:** Detect the presence of *any* bark beetle in an image. These models have been trained with the task of single-class object detection on Bark and ambrosia beetle specific data. These models do not classify the species of beetle, only its presence and location in an image.

  * **Multi-Class Species Detection:** Identify the species of a beetle from an image. These models have been trained with the task of multi-class object detection on Bark and ambrosia beetle specific data. These models classify the species of beetle, as well as its presence and location in an image.

  * **General Zero-Shot Detection:** Detect objects using a text prompt (e.g., "insect"), without prior training on bark and ambrosia beetle specific data (e.g., GroundingDINO, YoloWorld). These models have been trained on large, diverse datasets and can generalize to detect a wide range of objects based on textual descriptions.

  * **General Pre-trained Feature Extraction:** Extract feature embeddings from images using pre-trained models without additional training for specifically identifying bark and ambrosia beetles (e.g., DINOv3, EVA-02).

* **Model Evaluation:**
<br>This package includes a small set of ~2 000 images of 63 species for benchmarking model performance using a simple call to the `ibbi.Evaluator()` wrapper. The evaluation wrapper makes computing of the following types of metrics easy:

  * **Classification Metrics:** Assesses the model's ability to correctly classify species. The `evaluator.classification()` method returns a comprehensive set of metrics including accuracy, balanced accuracy, F1-score, Cohen's Kappa, precision, and recall alongside a confusion matrix.

  * **Object Detection Metrics:** Evaluates the accuracy of bounding box predictions. The `evaluator.object_detection()` method calculates the mean Average Precision (mAP) over a range of Intersection over Union (IoU) thresholds. You can also get per-class AP scores to understand performance on a species-by-species basis.

  * **Embedding & Clustering Metrics:** Evaluates the quality of the feature embeddings generated by the models. The `evaluator.embeddings()` method performs dimensionality reduction with Uniform Manifold Approximation and Projection (UMAP) and clustering with Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), then calculates:

    * **Intrinsic metrics** to assess the quality of the clusters themselves, such as the Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Index.
    * **Extrinsic metrics** that compare the clusters to the ground-truth species labels, including the Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), and Cluster Purity.
    * **Mantel Correlation:** A Mantel test is performed to see if the distances between species in the embedding space correlate with a known external distance matrix (e.g., taxonomic/phylogenetic distance). By default the package uses a sample phylogenetic distance matrix constrained to the current taxonomy of the species with branch lengths based on evolutionary divergence times of the COI gene to act as an estimate of evolutionary divergence.

* **Model Explainability:**
<br>Gain insights into why a model makes certain predictions with integrated explainability methods. The `ibbi.Explainer()` wrapper provides a simple interface for two popular techniques:

  * **SHapley Additive exPlanations (SHAP):** A powerful, game theory-based approach that attributes a prediction to the features of an input. The `explainer.with_shap()` method generates robust, theoretically-grounded explanations for a set of images, which is ideal for understanding the contribution of each part of an image to a prediction.
  * **Local Interpretable Model-agnostic Explanations (LIME):** A technique that explains the predictions of any classifier by learning an interpretable model locally around the prediction. Use the `explainer.with_lime()` method for a quicker, more intuitive visualization of which parts of a single image were most influential in the model's decision.

---

## Table of Contents

- [Intelligent Bark Beetle Identifier (IBBI)](#intelligent-bark-beetle-identifier-ibbi)
    - [Motivation](#motivation)
    - [Key Features](#key-features)
  - [Table of Contents](#table-of-contents)
  - [Workflow: How the Models Were Built](#workflow-how-the-models-were-built)
  - [Package API and Usage](#package-api-and-usage)
  - [Installation](#installation)
  - [Quick Start](#quick-start)
  - [Available Models](#available-models)
  - [Advanced Usage](#advanced-usage)
  - [How to Contribute](#how-to-contribute)
  - [License](#license)

---

## Workflow: How the Models Were Built

The trained models in `ibbi` are the result of a comprehensive data collection, annotation, and training pipeline by the Forest Entomology Lab at the University of Florida.

<p align="center">
  <img src="docs/assets/images/data_flow_ibbi.png" alt="IBBI Data Flow" width="800">
</p>

1.  **Data Collection and Curation:** The process begins with data collection from various sources. A zero-shot detection model performs initial bark beetle localization, followed by human-in-the-loop verification to ensure accurate bounding box annotations. Species classification is performed by expert taxonomists to provide high-quality species labels.
2.  **Model-Specific Training Data:** The annotated dataset is curated for different model types:
    * **Single-Class Detection:** Trained in a supervised manner on the task of object-detection and object-classification using all images with verified bark beetle localizations.
    * **Multi-Class Species Detection:** Trained in a supervised manner on the task of object-detection and object-classification using images with both verified localizations and species-level labels. To ensure robustness, species with fewer than 50 images are excluded.

    Note - No additional training is performed for the zero-shot detection and general feature extraction models, as they leverage pre-trained weights from large, diverse datasets.

1.  **Evaluation and Deployment:** A held-out test set is used to evaluate all models. A summary of performance metrics can be viewed with `ibbi.list_models()` or by simply viewing the [`model summary table`](src/ibbi/data/ibbi_model_summary.csv). Alternatively evalutation can be done independently by running `ibbi.Evaluator()`. The trained models are stored under the [`IBBI-bio Hugging Face Hub community`](https://huggingface.co/IBBI-bio) for easy access.

---

## Package API and Usage

The `ibbi` package is designed to be simple and intuitive. The following diagram summarizes the main functions, classes and methods including their inputs and outputs.

<p align="center">
  <img src="docs/assets/images/ibbi_inputs_outputs.png" alt="IBBI Inputs and Outputs" width="800">
</p>

---

## Installation

This package requires PyTorch. For compatibility with your specific hardware (e.g., CUDA-enabled GPU), please install PyTorch *before* installing `ibbi`.

**1. Install PyTorch**

Follow the official instructions at **[pytorch.org](https://pytorch.org/get-started/locally/)** to install the correct version for your system.

**2. Install IBBI**

Once PyTorch is installed, install the package from PyPI:

```bash
pip install ibbi
```
Or install the latest development version directly from GitHub:

```bash
pip install git+https://github.com/ChristopherMarais/IBBI.git
```

---

## Quick Start

Using `ibbi` is straightforward. Load a model and immediately use it for inference.

```python
import ibbi

# --- List Available Models ---
ibbi.list_models()

# --- Create a Model ---
classifier = ibbi.create_model(model_name="species_classifier", pretrained=True)

# --- Perform Inference ---
results = classifier.predict("path/to/your/image.jpg")

```

For more detailed demonstrations, please see the example notebooks located in the [`notebooks/`](notebooks/) folder of the repository.

---

## Available Models

To see a list of available models and their performance metrics directly from Python, run:

```python
ibbi.list_models()
```
Replace `model_name` with one of the available model names to use those models.

**Model Summary Table**

The most detailed version of the table can also be found [`here`](src/ibbi/data/ibbi_model_summary.csv).

| **Model Name**                                      | **Tasks**                                                              | **Pretrained Weights Repository**                                                 | **Paper**                            | **Embedding vector shape** | **# of images fine-tuned** | **mAP@[.5:.95]** | **F1-score (Macro)** | **Silhouette score** | **Cluster Purity** |
|-----------------------------------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------------------|--------------------------------------|----------------------------|----------------------------|------------------|----------------------|----------------------|--------------------|
| **grounding_dino_detect_model**                     | Zero-shot Object Detection (Prompt: 'bark beetle'), Feature extraction | https://huggingface.co/IDEA-Research/grounding-dino-base                          | https://arxiv.org/abs/2303.05499     | (1, 256)                   | 0                          | 0.673            | 0.969                | -0.239               | 0.997              |
| **yoloworldv2_bb_detect_model**                     | Zero-shot Object Detection (Prompt: 'bark beetle'), Feature extraction | https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8x-worldv2.pt | https://arxiv.org/pdf/2401.17270v2   | ( , 640)                   | 0                          | 0.004            | 0.007                | 0.686                | 0.577              |
| **yolov8x_bb_detect_model**                         | Single-class Object Detection, Feature extraction                      | https://huggingface.co/IBBI-bio/ibbi_yolov8_od                                    | https://arxiv.org/abs/2408.15857     | ( , 640)                   | 46 781                     | 0.985            | 0.988                | 0.558                | 0.554              |
| **yolov9e_bb_detect_model**                         | Single-class Object Detection, Feature extraction                      | https://huggingface.co/IBBI-bio/ibbi_yolov9_od                                    | https://arxiv.org/abs/2402.13616     | ( , 512)                   | 46 781                     | 0.987            | 0.989                | 0.338                | 0.568              |
| **yolov10x_bb_detect_model**                        | Single-class Object Detection, Feature extraction                      | https://huggingface.co/IBBI-bio/ibbi_yolov10_od                                   | https://arxiv.org/abs/2405.14458     | ( , 640)                   | 46 781                     | 0.990            | 0.987                | 0.428                | 0.559              |
| **yolov11x_bb_detect_model**                        | Single-class Object Detection, Feature extraction                      | https://huggingface.co/IBBI-bio/ibbi_yolov11_od                                   | https://www.arxiv.org/abs/2410.17725 | ( , 768)                   | 46 781                     | 0.989            | 0.989                | 0.656                | 0.569              |
| **yolov12x_bb_detect_model**                        | Multi-class Object Detection, Feature extraction                       | https://huggingface.co/IBBI-bio/ibbi_yolov12_oc                                   | https://arxiv.org/pdf/2502.12524     | ( , 768)                   | 46 781                     | 0.913            | 0.755                | TBD                  | TBD                |
| **rtdetrx_bb_detect_model**                         | Single-class Object Detection, Feature extraction                      | https://huggingface.co/IBBI-bio/ibbi_rtdetr_od                                    | https://arxiv.org/abs/2304.08069     | ( , 384)                   | 46 781                     | 0.885            | 0.890                | 0.386                | 0.534              |
| **yolov8x_bb_multi_class_detect_model**             | Multi-class Object Detection, Feature extraction                       | https://huggingface.co/IBBI-bio/ibbi_yolov8_oc                                    | https://arxiv.org/abs/2408.15857     | ( , 640)                   | 11 507                     | 0.916            | 0.844                | 0.529                | 0.428              |
| **yolov9e_bb_multi_class_detect_model**             | Multi-class Object Detection, Feature extraction                       | https://huggingface.co/IBBI-bio/ibbi_yolov9_oc                                    | https://arxiv.org/abs/2402.13616     | ( , 512)                   | 11 507                     | 0.918            | 0.868                | 0.227                | 0.957              |
| **yolov10x_bb_multi_class_detect_model**            | Multi-class Object Detection, Feature extraction                       | https://huggingface.co/IBBI-bio/ibbi_yolov10_oc                                   | https://arxiv.org/abs/2405.14458     | ( , 640)                   | 11 507                     | 0.913            | 0.785                | 0.279                | 0.453              |
| **yolov11x_bb_multi_class_detect_model**            | Multi-class Object Detection, Feature extraction                       | https://huggingface.co/IBBI-bio/ibbi_yolov11_oc                                   | https://www.arxiv.org/abs/2410.17725 | ( , 768)                   | 11 507                     | 0.917            | 0.762                | 0.386                | 0.403              |
| **yolov12x_bb_multi_class_detect_model**            | Multi-class Object Detection, Feature extraction                       | https://huggingface.co/IBBI-bio/ibbi_yolov12_oc                                   | https://arxiv.org/pdf/2502.12524     | ( , 768)                   | 12 507                     | 0.919            | 0.894                | TBD                  | TBD                |
| **rtdetrx_bb_multi_class_detect_model**             | Multi-class Object Detection, Feature extraction                       | https://huggingface.co/IBBI-bio/ibbi_rtdetr_oc                                    | https://arxiv.org/abs/2304.08069     | ( , 384)                   | 11 507                     | 0.885            | 0.890                | -0.030               | 0.940              |
| **dinov2_vitl14_lvd142m_features_model**            | Feature extraction                                                     | https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1k      | https://arxiv.org/html/2304.07193v2  | (1, 1024)                  | 0                          | N/A              | N/A                  | 0.262                | 0.475              |
| **dinov3_vitl16_lvd1689m_features_model**           | Feature extraction                                                     | https://huggingface.co/IBBI-bio/dinov3-vitl16-pretrain-lvd1689m                   | https://arxiv.org/pdf/2508.10104     | ( , 384)                   | 0                          | N/A              | N/A                  | 0.543                | 0.487              |
| **eva02_base_patch14_224_mim_in22k_features_model** | Feature extraction                                                     | https://huggingface.co/timm/eva02_base_patch14_224.mim_in22k                      | https://arxiv.org/pdf/2303.11331     | (1, 768)                   | 0                          | N/A              | N/A                  | -0.174               | 0.944              |
| **convformer_b36_features_model**                   | Feature extraction                                                     | https://huggingface.co/timm/caformer_b36.sail_in22k_ft_in1k_384                   | https://arxiv.org/pdf/2210.13452     | (1, 768)                   | 0                          | N/A              | N/A                  | 0.100                | 0.880              |

---
---


## Advanced Usage

For more detailed examples, please see the example notebooks located in the [`notebooks/`](notebooks/) folder of the repository. Additionally the Documentation site at **[gcmarais.com/IBBI](https://gcmarais.com/IBBI/)** contains more in-depth explanations and examples.

**Inference**

Use inference to identify and locate bark and ambrosia beetles in an image.

```python
# --- Create a Model ---
classifier = ibbi.create_model(model_name="species_classifier", pretrained=True)

# --- Perform Inference ---
results = classifier.predict("path/to/your/image.jpg")
```
First, create a model using the `ibbi.create_model()` function. You can specify the `model_name` from the available models listed by `ibbi.list_models()`. Set `pretrained=True` to load the pre-trained weights.

The results from the `predict()` method will return a list of dictionaries containing the following keys:

- `labels`: The predicted class label.
- `scores`: The confidence score of the prediction.
- `boxes`: The bounding box coordinates (if applicable).

NOTE: This only works for zero-shot, single-class, and multi-class detection models. Feature extraction models do not have a `predict()` method.

<table style="width: 100%; border: none;">
  <thead>
    <tr>
      <th style="width: 25%; text-align: center;">Input Image</th>
      <th style="width: 25%; text-align: center;">Single-Class Detection<br></th>
      <th style="width: 25%; text-align: center;">Multi-Class Species Detection<br></th>
      <th style="width: 25%; text-align: center;">Zero-Shot Detection<br></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center;"><img src="docs/assets/images/beetles.png" alt="Beetles" style="max-width: 100%;"></td>
      <td style="text-align: center;"><img src="docs/assets/images/beetles_od.png" alt="Object Detection" style="max-width: 100%;"></td>
      <td style="text-align: center;"><img src="docs/assets/images/beetles_oc.png" alt="Object Classification" style="max-width: 100%;"></td>
      <td style="text-align: center;"><img src="docs/assets/images/beetles_zsoc.png" alt="Zero-Shot Classification" style="max-width: 100%;"></td>
    </tr>
  </tbody>
</table>

---

**Feature Extraction**

All models can extract deep feature embeddings from an image. These vectors are useful for downstream tasks like clustering or similarity analysis.

```python
# --- Create a Model ---
feature_extractor = ibbi.create_model(model_name="species_classifier", pretrained=True)

# --- Perform Inference ---
results = feature_extractor.extract_features("path/to/your/image.jpg")
```

The results from the `extract_features()` method will return a NumPy array of shape `(1, embedding_dimension)`. Different models will have different embedding dimensions, which can be found in the model summary table above.

NOTE: We recommend to create different instances of the model for inference and feature extraction to avoid any potential conflicts.

---

**Model Evaluation**

Evaluate model performance on a held-out test set using the `ibbi.Evaluator()` wrapper. This wrapper provides methods for computing classification, object detection, and embedding/clustering metrics to estimate the quality of feature embeddings.

```python
# --- Import Data ---
data = ibbi.get_dataset()

# --- Create a Model ---
model = ibbi.create_model(model_name="species_classifier", pretrained=True)

# --- Create an Evaluator ---
evaluator = ibbi.Evaluator(model=model, dataset=data)

# --- Classification Metrics ---
classification_results = evaluator.classification()

# --- Object Detection Metrics ---
od_results = evaluator.object_detection()

# --- Embedding & Clustering Metrics ---
embedding_results = evaluator.embeddings()
```

You can customize the evaluation by providing your own dataset in our expected format.

---

**Model Explainability**

Understand *why* a model made a certain prediction. This is crucial for interpreting the model's decisions by highlighting which pixels were most influential.

```python
# --- Create a Model ---
model = ibbi.create_model(model_name="species_classifier", pretrained=True)

# --- Create an Explainer ---
explainer = ibbi.Explainer(model=model)

# --- Explain with SHAP ---
shap_results = explainer.with_shap(explain_dataset=["path/to/your/image1.jpg", "path/to/your/image2.jpg"])

# --- Explain with LIME ---
lime_results = explainer.with_lime(image="path/to/your/image.jpg")
```
---

## How to Contribute

Contributions are welcome! If you would like to improve IBBI, please see the [Contribution Guide](docs/CONTRIBUTING.md).

## License

This project is licensed under the MIT License. See the [`LICENSE`](LICENSE) file for details.

