Metadata-Version: 2.4
Name: hallunox
Version: 0.2.2
Summary: A confidence-aware routing system for LLM hallucination detection using multi-signal approach
Home-page: https://github.com/convai-innovations/hallunox
Author: Nandakishor M
Author-email: Nandakishor M <support@convaiinnovations.com>
Maintainer-email: "Convai Innovations Pvt. Ltd." <support@convaiinnovations.com>
License: AGPL-3.0
Project-URL: Homepage, https://convaiinnovations.com
Project-URL: Repository, https://github.com/convai-innovations/hallunox
Project-URL: Documentation, https://hallunox.readthedocs.io
Project-URL: Bug Reports, https://github.com/convai-innovations/hallunox/issues
Project-URL: Source Code, https://github.com/convai-innovations/hallunox
Keywords: hallucination-detection,llm,confidence-estimation,model-reliability,uncertainty-quantification,ai-safety
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU Affero General Public License v3
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=1.13.0
Requires-Dist: transformers>=4.21.0
Requires-Dist: datasets>=2.0.0
Requires-Dist: FlagEmbedding>=1.2.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: numpy==1.26.4
Requires-Dist: tqdm>=4.64.0
Requires-Dist: pathlib
Requires-Dist: Pillow>=8.0.0
Requires-Dist: bitsandbytes>=0.41.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Provides-Extra: training
Requires-Dist: wandb>=0.12.0; extra == "training"
Requires-Dist: tensorboard>=2.8.0; extra == "training"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# HalluNox

**Confidence-Aware Routing for Large Language Model Reliability Enhancement**

A Python package implementing a multi-signal approach to pre-generation hallucination mitigation for Large Language Models. HalluNox combines semantic alignment measurement, internal convergence analysis, and learned confidence estimation to produce unified confidence scores for proactive routing decisions.

## ✨ Features

- **🎯 Pre-generation Hallucination Detection**: Assess model reliability before generation begins
- **🔄 Confidence-Aware Routing**: Automatically route queries based on estimated confidence
- **🧠 Multi-Signal Approach**: Combines semantic alignment, internal convergence, and learned confidence
- **⚡ Multi-Model Support**: Llama-3.2-3B-Instruct and MedGemma-4B-IT architectures
- **🏥 Medical Domain Specialization**: Enhanced MedGemma 4B-IT support with medical-grade confidence thresholds
- **🖼️ Multimodal Capabilities**: Image analysis and response generation for MedGemma models
- **📊 Comprehensive Evaluation**: Built-in metrics and routing strategy analysis
- **🚀 Easy Integration**: Simple API for both training and inference
- **🏃‍♂️ Performance Optimizations**: Optional LLM loading for faster initialization and lower memory usage
- **📝 Enhanced Query-Context**: Improved accuracy with structured prompt formatting
- **🎛️ Adaptive Thresholds**: Dynamic confidence thresholds based on model type (0.62 for medical, 0.65 for general)
- **💬 Response Generation**: Built-in response generation with confidence-gated output
- **🔧 Automatic Model Management**: Auto-download and configuration for supported models

## 🔬 Research Foundation

Based on the research paper "Confidence-Aware Routing for Large Language Model Reliability Enhancement: A Multi-Signal Approach to Pre-Generation Hallucination Mitigation" by Nandakishor M (Convai Innovations).

The approach implements deterministic routing to appropriate response pathways:

### General Models (Llama-3.2-3B)
- **High Confidence (≥0.65)**: Local generation  
- **Medium Confidence (0.60-0.65)**: Retrieval-augmented generation
- **Low Confidence (0.4-0.60)**: Route to larger models
- **Very Low Confidence (<0.4)**: Human review required

### Medical Models (MedGemma-4B-IT)
- **High Medical Confidence (≥0.62)**: Local generation with medical validation
- **Medium Medical Confidence (0.55-0.62)**: Medical literature verification required
- **Low Medical Confidence (0.50-0.55)**: Professional medical verification required
- **Very Low Medical Confidence (<0.50)**: Seek professional medical advice

## 🚀 Installation

### Requirements

- Python 3.8+
- PyTorch 1.13+
- CUDA-compatible GPU (recommended)
- At least 8GB GPU memory for training
- 16GB RAM minimum (32GB recommended)

### Install from PyPI

```bash
pip install hallunox
```

### Install from Source

```bash
git clone https://github.com/convai-innovations/hallunox.git
cd hallunox
pip install -e .
```

### Core Dependencies

HalluNox automatically installs:

- `torch>=1.13.0` - PyTorch framework
- `transformers>=4.21.0` - Hugging Face Transformers
- `FlagEmbedding>=1.2.0` - BGE-M3 embedding model
- `datasets>=2.0.0` - Dataset loading utilities
- `scikit-learn>=1.0.0` - Evaluation metrics
- `numpy>=1.21.0` - Numerical computations
- `tqdm>=4.64.0` - Progress bars
- `Pillow>=8.0.0` - Image processing for multimodal capabilities
- `bitsandbytes>=0.41.0` - 4-bit quantization for memory optimization

## 📖 Quick Start

### Basic Usage (Llama-3.2-3B)

```python
from hallunox import HallucinationDetector

# Initialize detector (downloads pre-trained model automatically)
detector = HallucinationDetector()

# Analyze text for hallucination risk
results = detector.predict([
    "The capital of France is Paris.",  # High confidence
    "Your password is 12345678.",       # Low confidence  
    "The Moon is made of cheese."       # Very low confidence
])

# View results
for pred in results["predictions"]:
    print(f"Text: {pred['text']}")
    print(f"Confidence: {pred['confidence_score']:.3f}")
    print(f"Risk Level: {pred['risk_level']}")
    print(f"Routing Action: {pred['routing_action']}")
    print()
```

### 🏥 MedGemma Medical Domain Usage

For medical applications using MedGemma 4B-IT with multimodal capabilities:

```python
from hallunox import HallucinationDetector
from PIL import Image

# Initialize MedGemma detector (auto-downloads medical model)
detector = HallucinationDetector(
    llm_model_id="google/medgemma-4b-it",
    confidence_threshold=0.62,  # Medical-grade threshold
    enable_response_generation=True,  # Enable response generation
    enable_inference=True,
    mode="auto"  # Auto-detects multimodal capabilities (default)
)

# Medical text analysis
medical_results = detector.predict([
    "Aspirin can help reduce heart attack risk when prescribed by a doctor.",
    "Drinking bleach will cure COVID-19.",  # Dangerous misinformation
    "Type 2 diabetes requires insulin injections in all cases.",  # Partially incorrect
])

for pred in medical_results["predictions"]:
    print(f"Medical Text: {pred['text'][:60]}...")
    print(f"Confidence: {pred['confidence_score']:.3f}")
    print(f"Risk Level: {pred['risk_level']}")
    print(f"Medical Action: {pred['routing_action']}")
    print(f"Description: {pred['description']}")
    print("-" * 50)

# Response generation with confidence checking
question = "What are the symptoms of pneumonia?"
response = detector.generate_response(question, check_confidence=True)

if response["should_generate"]:
    print(f"✅ Medical Response Generated (confidence: {response['confidence_score']:.3f})")
    print(f"Response: {response['response']}")
    print(f"Meets threshold: {response['meets_threshold']}")
else:
    print(f"⚠️ Response blocked (confidence: {response['confidence_score']:.3f})")
    print(f"Reason: {response['reason']}")
    print(f"Recommendation: {response['recommendation']}")

# Multimodal image analysis (MedGemma 4B-IT only)
if detector.is_multimodal:
    print("\n🖼️ Multimodal Image Analysis")
    
    # Load medical image (replace with actual medical image)
    try:
        image = Image.open("chest_xray.jpg")
    except:
        # Create demo image for testing
        image = Image.new('RGB', (224, 224), color='lightgray')
    
    # Analyze image confidence
    image_results = detector.predict_images([image], ["Chest X-ray"])
    
    for pred in image_results["predictions"]:
        print(f"Image: {pred['image_description']}")
        print(f"Confidence: {pred['confidence_score']:.3f}")
        print(f"Interpretation: {pred['interpretation']}")
        print(f"Risk Level: {pred['risk_level']}")
    
    # Generate image description
    description = detector.generate_image_response(
        image, 
        "Describe the findings in this chest X-ray."
    )
    print(f"Generated Description: {description}")
```

### 🔧 Advanced Configuration

```python
from hallunox import HallucinationDetector

# Full configuration example
detector = HallucinationDetector(
    # Model selection
    llm_model_id="google/medgemma-4b-it",  # or "unsloth/Llama-3.2-3B-Instruct"
    embed_model_id="BAAI/bge-m3",
    
    # Custom model weights (optional)
    model_path="/path/to/custom/model.pt",  # None = auto-download
    
    # Hardware configuration
    device="cuda",  # or "cpu"
    use_fp16=True,  # Mixed precision for faster inference
    
    # Sequence lengths
    max_length=512,      # LLM context length
    bge_max_length=512,  # BGE-M3 context length
    
    # Feature toggles
    load_llm=True,                    # Load LLM for embeddings
    enable_inference=True,            # Enable LLM inference
    enable_response_generation=True,  # Enable response generation
    
    # Confidence settings
    confidence_threshold=0.62,  # Custom threshold (auto-detected by model type)
    
    # Operation mode
    mode="auto",  # "auto", "text", "image", or "both"
)

# Check model capabilities
print(f"Model type: {'Medical' if detector.is_medgemma_4b else 'General'}")
print(f"Multimodal support: {detector.is_multimodal}")
print(f"Operation mode: {detector.effective_mode} (requested: {detector.mode})")
print(f"Confidence threshold: {detector.confidence_threshold}")
```

### 🎛️ Operation Mode Configuration

The `mode` parameter controls what types of input the detector can process:

```python
from hallunox import HallucinationDetector

# Auto mode (recommended) - detects capabilities from model
detector = HallucinationDetector(
    llm_model_id="google/medgemma-4b-it",
    mode="auto"  # Auto: MedGemma->both, others->text
)

# Text-only mode - processes text inputs only
detector = HallucinationDetector(
    llm_model_id="google/medgemma-4b-it",
    mode="text"  # Forces text-only, even for multimodal models
)

# Image-only mode - processes images only (requires MedGemma 4b-it)
detector = HallucinationDetector(
    llm_model_id="google/medgemma-4b-it",
    mode="image"  # Image processing only
)

# Both mode - processes text and images (requires MedGemma 4b-it)
detector = HallucinationDetector(
    llm_model_id="google/medgemma-4b-it",
    mode="both"  # Explicit multimodal mode
)
```

#### Mode Validation

- **Text mode**: Available for all models
- **Image mode**: Requires MedGemma 4b-it model
- **Both mode**: Requires MedGemma 4b-it model
- **Auto mode**: Automatically selects based on model capabilities
  - MedGemma 4b-it → `effective_mode = "both"`
  - Other models → `effective_mode = "text"`

#### Error Examples

```python
# This will raise an error - image mode requires MedGemma
detector = HallucinationDetector(
    llm_model_id="unsloth/Llama-3.2-3B-Instruct",
    mode="image"  # ❌ Error: Image mode requires MedGemma 4b-it
)

# This will raise an error - calling image methods in text mode
detector = HallucinationDetector(
    llm_model_id="google/medgemma-4b-it",
    mode="text"
)
detector.predict_images([image])  # ❌ Error: Current mode is 'text'
```

### ⚡ Performance Optimized Usage

For faster initialization when only doing embedding comparisons:

```python
from hallunox import HallucinationDetector

# Option 1: Factory method for embedding-only usage
detector = HallucinationDetector.for_embedding_only(
    device="cuda",
    use_fp16=True
)

# Option 2: Explicit parameter control
detector = HallucinationDetector(
    load_llm=False,         # Skip expensive LLM loading
    enable_inference=False, # Disable inference capabilities
    use_fp16=True          # Use mixed precision
)

# Note: This configuration cannot perform predictions
# Use for preprocessing or embedding extraction only
```

### 🧠 Memory Optimization with Quantization

For GPUs with limited VRAM (8-16GB), use 4-bit quantization:

```python
from hallunox import HallucinationDetector

# Option 1: Auto-optimized for low memory (recommended)
detector = HallucinationDetector.for_low_memory(
    llm_model_id="google/medgemma-4b-it",  # Or any supported model
    device="cuda",
    enable_response_generation=True
)

# Option 2: Manual quantization configuration
detector = HallucinationDetector(
    llm_model_id="google/medgemma-4b-it",
    use_quantization=True,  # Enable 4-bit quantization
    enable_response_generation=True,
    device="cuda"
)

# Option 3: Custom quantization settings
from transformers import BitsAndBytesConfig
import torch

custom_quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",          # NF4 quantization type
    bnb_4bit_use_double_quant=True,     # Double quantization for extra savings
    bnb_4bit_compute_dtype=torch.bfloat16  # Compute in bfloat16
)

detector = HallucinationDetector(
    llm_model_id="google/medgemma-4b-it",
    quantization_config=custom_quant_config,
    device="cuda"
)

print(f"✅ Memory optimized: {detector.use_quantization}")
print(f"🔧 Quantization: 4-bit NF4 with double quantization")
```

### 💾 Memory Usage Comparison

| Configuration | Model Size | VRAM Usage | Performance |
|--------------|------------|------------|-------------|
| **Full Precision** | ~16GB | ~14GB | 100% speed |
| **FP16 Mixed Precision** | ~8GB | ~7GB | 95% speed |
| **4-bit Quantization** | ~4GB | ~3.5GB | 85-90% speed |
| **4-bit + Double Quant** | ~3.5GB | ~3GB | 85-90% speed |

**Recommendation**: Use `HallucinationDetector.for_low_memory()` for GPUs with 8GB or less VRAM.

### 📝 Enhanced Query-Context Formatting

For better accuracy with contextual information:

```python
from hallunox import HallucinationDetector

detector = HallucinationDetector()

# Use query-context pairs for improved embedding accuracy
query_context_pairs = [
    {
        "query": "What is the capital of France?",
        "context": "France is a European country with rich history and culture."
    },
    {
        "query": "The Moon is made of green cheese",
        "context": "The Moon is Earth's natural satellite composed of rock and metal."
    }
]

# Method 1: Direct query-context prediction
results = detector.predict_with_query_context(query_context_pairs)

# Method 2: Using the predict method with context parameter
texts = [pair["query"] for pair in query_context_pairs]
results = detector.predict(texts, query_context_pairs=query_context_pairs)

# Enhanced accuracy for contextual queries
for pred in results["predictions"]:
    print(f"Query: {pred['text']}")
    print(f"Enhanced Confidence: {pred['confidence_score']:.3f}")
```

## 🖥️ Command Line Interface

HalluNox provides a comprehensive CLI for various use cases:

### Interactive Mode
```bash
# General model interactive mode
hallunox-infer --interactive

# MedGemma medical interactive mode
hallunox-infer --llm_model_id google/medgemma-4b-it --interactive --show_generated_text
```

### Batch Processing
```bash
# Process file with general model
hallunox-infer --input_file medical_texts.txt --output_file results.json

# Process with MedGemma and medical settings
hallunox-infer \
    --llm_model_id google/medgemma-4b-it \
    --input_file medical_texts.txt \
    --output_file medical_results.json \
    --show_routing \
    --show_generated_text
```

### Image Analysis (MedGemma only)
```bash
# Single image analysis
hallunox-infer \
    --llm_model_id google/medgemma-4b-it \
    --image_path chest_xray.jpg \
    --show_generated_text

# Batch image analysis
hallunox-infer \
    --llm_model_id google/medgemma-4b-it \
    --image_folder /path/to/medical/images \
    --output_file image_analysis.json
```

### Demo Mode
```bash
# General demo
hallunox-infer --demo --show_routing

# Medical demo with MedGemma
hallunox-infer \
    --llm_model_id google/medgemma-4b-it \
    --demo \
    --mode both \
    --show_routing

# Text-only demo (faster initialization)
hallunox-infer \
    --llm_model_id google/medgemma-4b-it \
    --demo \
    --mode text \
    --show_routing
```

## 🔨 Training Your Own Model

### Quick Training

```python
from hallunox import Trainer, TrainingConfig

# Configure training
config = TrainingConfig(
    # Model selection
    model_id="google/medgemma-4b-it",  # or "unsloth/Llama-3.2-3B-Instruct"
    embed_model_id="BAAI/bge-m3",
    
    # Training parameters
    batch_size=8,
    learning_rate=5e-4,
    max_epochs=6,
    warmup_steps=300,
    
    # Dataset configuration
    use_truthfulqa=True,
    use_halueval=True,
    use_fever=True,
    max_samples_per_dataset=3000,
    
    # Output
    output_dir="./models/my_medical_model"
)

# Train model
trainer = Trainer(config)
trainer.train()
```

### Command Line Training
```bash
# Train general model
hallunox-train --batch_size 8 --learning_rate 5e-4 --max_epochs 6

# Train medical model
hallunox-train \
    --model_id google/medgemma-4b-it \
    --batch_size 4 \
    --learning_rate 3e-4 \
    --max_epochs 8 \
    --output_dir ./models/custom_medgemma
```

## 🏗️ Model Architecture

HalluNox supports two main architectures:

### General Architecture (Llama-3.2-3B)
1. **LLM Component**: Llama-3.2-3B-Instruct
   - Extracts internal hidden representations (3072D)
   - Supports any Llama-architecture model
   
2. **Embedding Model**: BGE-M3 (fixed)
   - Provides reference semantic embeddings
   - 1024-dimensional dense vectors

3. **Projection Network**: Standard ProjectionHead
   - Maps LLM hidden states to embedding space
   - 3-layer MLP with ReLU activations and dropout

### Medical Architecture (MedGemma-4B-IT)
1. **Unified Multimodal Model**: 
   - **Single Model**: AutoModelForImageTextToText handles both text and images
   - **Memory Optimized**: Avoids double loading (saves ~8GB VRAM)
   - **Fallback Support**: Graceful degradation to text-only if needed
   
2. **Embedding Model**: BGE-M3 (same as general)
   - Enhanced with medical context formatting
   
3. **Projection Network**: UltraStableProjectionHead
   - Ultra-stable architecture with heavy normalization
   - Conservative weight initialization for medical precision
   - Tanh activations for stability
   - Enhanced dropout and layer normalization

4. **Multimodal Processor**: AutoProcessor
   - Handles image + text inputs
   - Supports chat template formatting

5. **Quantization Support**: 4-bit NF4 with double quantization
   - Reduces memory usage by ~75%
   - Maintains 85-90% performance
   - Automatic fallback for CPU

## 📊 API Reference

### HallucinationDetector

#### Constructor Parameters

```python
HallucinationDetector(
    model_path: str = None,                    # Path to trained model (None = auto-download)
    llm_model_id: str = "unsloth/Llama-3.2-3B-Instruct",  # LLM model ID
    embed_model_id: str = "BAAI/bge-m3",      # Embedding model ID
    device: str = None,                        # Device (None = auto-detect)
    max_length: int = 512,                     # LLM sequence length
    bge_max_length: int = 512,                # BGE-M3 sequence length
    use_fp16: bool = True,                     # Mixed precision
    load_llm: bool = True,                     # Load LLM
    enable_inference: bool = False,            # Enable LLM inference
    confidence_threshold: float = None,        # Custom threshold (auto-detected)
    enable_response_generation: bool = False,  # Enable response generation
    use_quantization: bool = False,            # Enable 4-bit quantization for memory savings
    quantization_config: BitsAndBytesConfig = None,  # Custom quantization config
    mode: str = "auto",                        # Operation mode: "auto", "text", "image", "both"
)
```

#### Core Methods

**Text Analysis:**
- `predict(texts, query_context_pairs=None)` - Analyze texts for hallucination confidence
- `predict_with_query_context(query_context_pairs)` - Query-context prediction
- `batch_predict(texts, batch_size=16)` - Efficient batch processing

**Response Generation:**
- `generate_response(prompt, max_length=512, check_confidence=True)` - Generate responses with confidence checking

**Multimodal (MedGemma only):**
- `predict_images(images, image_descriptions=None)` - Analyze image confidence
- `generate_image_response(image, prompt, max_length=200)` - Generate image descriptions

**Analysis:**
- `evaluate_routing_strategy(texts)` - Analyze routing decisions

**Factory Methods:**
- `for_embedding_only()` - Create embedding-only detector
- `for_low_memory()` - Create memory-optimized detector with 4-bit quantization

#### Response Format

```python
{
    "predictions": [
        {
            "text": "input text",
            "confidence_score": 0.85,           # 0.0 to 1.0
            "similarity_score": 0.92,          # Cosine similarity
            "interpretation": "HIGH_CONFIDENCE", # or HIGH_MEDICAL_CONFIDENCE
            "risk_level": "LOW_RISK",          # or LOW_MEDICAL_RISK
            "routing_action": "LOCAL_GENERATION",
            "description": "This response appears to be factual and reliable."
        }
    ],
    "summary": {
        "total_texts": 1,
        "avg_confidence": 0.85,
        "high_confidence_count": 1,
        "medium_confidence_count": 0,
        "low_confidence_count": 0,
        "very_low_confidence_count": 0
    }
}
```

#### Response Generation Format

```python
{
    "response": "Generated response text",
    "confidence_score": 0.85,
    "should_generate": True,
    "meets_threshold": True,
    # Or when blocked:
    "reason": "Confidence 0.45 below threshold 0.62",
    "recommendation": "RAG_RETRIEVAL"
}
```

### Training Classes

- **`TrainingConfig`**: Configuration dataclass for training parameters
- **`Trainer`**: Main training class with dataset loading and model training
- **`MultiDatasetLoader`**: Loads and combines multiple hallucination detection datasets

### Utility Functions

- **`download_model()`**: Download general pre-trained model
- **`download_medgemma_model(model_name)`**: Download MedGemma medical model
- **`setup_logging(level)`**: Configure logging
- **`check_gpu_availability()`**: Check CUDA compatibility
- **`validate_model_requirements()`**: Verify dependencies

## 📈 Performance

Our confidence-aware routing system demonstrates:

- **74% hallucination detection rate** (vs 42% baseline)
- **9% false positive rate** (vs 15% baseline)  
- **40% reduction in computational cost** vs post-hoc methods
- **1.6x cost multiplier** vs always using expensive operations (4.2x)

### Medical Domain Performance (MedGemma)
- **Enhanced medical accuracy** with 0.62 confidence threshold
- **Multimodal capability** for medical image analysis
- **Safety-first approach** with conservative thresholds
- **Professional verification workflow** for low-confidence cases

## 🖥️ Hardware Requirements

### Minimum (Inference Only)
- **CPU**: Modern multi-core processor
- **RAM**: 16GB system memory
- **GPU**: 8GB VRAM (RTX 3070, RTX 4060 Ti+)
- **Storage**: 15GB free space
- **Models**: ~5GB each (Llama/MedGemma)

### Recommended (Inference)
- **CPU**: Intel i7/AMD Ryzen 7+
- **RAM**: 32GB system memory  
- **GPU**: 12GB+ VRAM (RTX 4070, RTX 3080+)
- **Storage**: NVMe SSD, 25GB+ free
- **CUDA**: 11.8+ compatible driver

### Training Requirements
- **CPU**: High-performance multi-core (i9/Ryzen 9)
- **RAM**: 64GB+ system memory
- **GPU**: 24GB+ VRAM (RTX 4090, A100, H100)
- **Storage**: 200GB+ NVMe SSD
  - Model checkpoints: ~10GB per epoch
  - Training datasets: ~30GB
  - Logs and outputs: ~50GB
- **Network**: High-speed internet for downloads

### MedGemma Specific
- **Additional storage**: +10GB for multimodal models
- **Image processing**: PIL/Pillow for image capabilities
- **Memory**: +4GB RAM for image processing pipeline

### CPU-Only Mode
- **RAM**: 32GB minimum (64GB recommended)
- **Performance**: 10-50x slower than GPU
- **Not recommended**: For production medical applications

## 🔒 Safety Considerations

### Medical Applications
- **Professional oversight required**: HalluNox is a research tool, not medical advice
- **Validation needed**: All medical outputs should be verified by qualified professionals
- **Conservative thresholds**: 0.62 threshold ensures high precision for medical content
- **Clear disclaimers**: Always include appropriate medical disclaimers in applications

### General Use
- **Confidence-based routing**: Use routing recommendations for appropriate escalation
- **Human oversight**: Very low confidence predictions require human review
- **Regular evaluation**: Monitor performance on your specific use cases

## 🛠️ Troubleshooting

### Common Issues and Solutions

#### CUDA Out of Memory Error
```
OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB...
```
**Solution**: Use 4-bit quantization
```python
detector = HallucinationDetector.for_low_memory()
```

#### Deprecated torch_dtype Warning
```
`torch_dtype` is deprecated! Use `dtype` instead!
```
**Solution**: Already fixed in HalluNox v0.2.2+ - the package now uses the correct `dtype` parameter.

#### Double Model Loading (MedGemma)
```
Loading checkpoint shards: 100% 2/2 [00:37<00:00, 18.20s/it]
Loading checkpoint shards: 100% 2/2 [00:36<00:00, 17.88s/it]
```
**Solution**: Already optimized in HalluNox v0.2.2+ - MedGemma now uses a unified model approach that avoids double loading.

#### Accelerate Warning
```
WARNING:accelerate.big_modeling:Some parameters are on the meta device...
```
**Solution**: This is normal with quantization - parameters are automatically moved to GPU during inference.

#### Environment Optimization
For better memory management, set:
```bash
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
```

### Memory Requirements by Configuration

| GPU VRAM | Recommended Configuration | Expected Performance |
|----------|--------------------------|---------------------|
| **4-6GB** | `for_low_memory()` + reduce batch size | Basic functionality |
| **8-12GB** | `for_low_memory()` | Full functionality |
| **16GB+** | Standard configuration | Optimal performance |
| **24GB+** | Multiple models + training | Development/research |

## 📄 License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

## 📚 Citation

If you use HalluNox in your research, please cite:

```bibtex
@article{nandakishor2024hallunox,
    title={Confidence-Aware Routing for Large Language Model Reliability Enhancement: A Multi-Signal Approach to Pre-Generation Hallucination Mitigation},
    author={Nandakishor M},
    journal={AI Safety Research},
    year={2024},
    organization={Convai Innovations}
}
```

## 🤝 Contributing

We welcome contributions! Please see our contributing guidelines and submit pull requests to our repository.

### Development Setup
```bash
git clone https://github.com/convai-innovations/hallunox.git
cd hallunox
pip install -e ".[dev]"
```

## 📞 Support

For technical support and questions:
- **Email**: support@convaiinnovations.com  
- **Issues**: [GitHub Issues](https://github.com/convai-innovations/hallunox/issues)
- **Documentation**: Full API docs available online

## 👨‍💻 Author

**Nandakishor M**  
AI Safety Research  
Convai Innovations Pvt. Ltd.  
Email: support@convaiinnovations.com

---

**Disclaimer**: HalluNox is a research tool for hallucination detection and should not be used as the sole basis for critical decisions, especially in medical contexts. Always seek professional advice for medical applications.
