# ROTA - Real-time Offensive Threat Assessment

[![PyPI version](https://badge.fury.io/py/rota.svg)](https://badge.fury.io/py/rota)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**ROTA** is a research framework for predicting zero-day vulnerabilities using behavioral signals, clustering analysis, and temporal validation. It combines multiple data sources to identify high-risk vulnerabilities before they are actively exploited.

## 🎯 What is ROTA?

ROTA uses a wheel metaphor to represent its architecture:

```
                    ┌─────────────┐
                    │   ORACLE    │
                    │ (Prediction)│
                    └──────┬──────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
   ┌────▼────┐        ┌────▼────┐       ┌────▼────┐
   │  WHEEL  │        │   HUB   │       │  AXLE   │
   │(Cluster)│◄───────┤ (Neo4j) │──────►│  (Eval) │
   └────▲────┘        └────▲────┘       └─────────┘
        │                  │
        └──────────────────┼──────────────────┘
                           │
                      ┌────▼────┐
                      │ SPOKES  │
                      │ (Data)  │
                      └─────────┘
```

- **Spokes**: Collect data from multiple sources (CVE, EPSS, KEV, etc.)
- **Hub**: Central Neo4j graph database for data integration
- **Wheel**: Clustering and pattern discovery
- **Oracle**: Prediction and risk assessment
- **Axle**: Evaluation and temporal validation

## 🚀 Quick Start

### Installation

```bash
pip install rota
```

### Basic Usage

```bash
# Collect CVE data
rota spokes collect-cve --start-date 2025-01-01 --end-date 2025-01-31

# Collect EPSS scores
rota spokes collect-epss --cve-ids CVE-2025-1234

# Collect CISA KEV catalog
rota spokes collect-kev

# Load data into Neo4j
rota hub load-cve data/raw/cve/cves_20250127.jsonl
rota hub load-epss data/raw/epss/epss_20250127.jsonl
rota hub load-kev data/raw/kev/kev_20250127.jsonl

# Check hub status
rota hub status
```

## 📊 Data Sources

ROTA integrates multiple vulnerability data sources:

| Source | Description | Coverage |
|--------|-------------|----------|
| **CVE/NVD** | National Vulnerability Database | All published CVEs |
| **EPSS** | Exploit Prediction Scoring System | Daily probability scores |
| **KEV** | CISA Known Exploited Vulnerabilities | Government-verified exploits |
| **GitHub Advisory** | Package-level security advisories | npm, PyPI, Maven, etc. |
| **Exploit-DB** | Public exploit database | Proof-of-concept exploits |

## 🏗️ Architecture

### Spokes (Data Collection)

```python
from rota.spokes import CVECollector, EPSSCollector, KEVCollector

# Collect CVE data
cve_collector = CVECollector()
stats = cve_collector.collect(
    start_date="2025-01-01",
    end_date="2025-01-31"
)

# Collect EPSS scores
epss_collector = EPSSCollector()
stats = epss_collector.collect(cve_ids=["CVE-2025-1234"])

# Collect KEV catalog
kev_collector = KEVCollector()
stats = kev_collector.collect()
```

### Hub (Data Integration)

```python
from rota.hub import Neo4jConnection, DataLoader
from pathlib import Path

# Connect to Neo4j
with Neo4jConnection() as conn:
    loader = DataLoader(conn)
    
    # Load CVE data
    stats = loader.load_cve_data(Path("data/raw/cve/cves.jsonl"))
    
    # Load EPSS data
    stats = loader.load_epss_data(Path("data/raw/epss/epss.jsonl"))
    
    # Load KEV data
    stats = loader.load_kev_data(Path("data/raw/kev/kev.jsonl"))
```

### Wheel (Clustering)

```python
from rota.wheel import VulnerabilityClusterer, FeatureExtractor

# Extract features
extractor = FeatureExtractor()
features = extractor.extract_from_neo4j()

# Cluster vulnerabilities
clusterer = VulnerabilityClusterer(method="dbscan")
clusterer.fit(features)
clusters = clusterer.predict(features)
```

### Oracle (Prediction)

```python
from rota.oracle import VulnerabilityPredictor

# Predict exploitation risk
predictor = VulnerabilityPredictor()
result = predictor.predict("CVE-2025-1234")

print(f"Risk Score: {result['risk_score']}")
print(f"Confidence: {result['confidence']}")
print(f"Recommendation: {result['recommendation']}")
```

### Axle (Evaluation)

```python
from rota.axle import TemporalValidator
from datetime import datetime

# Validate predictions with temporal awareness
validator = TemporalValidator(cutoff_date=datetime(2025, 1, 1))
metrics = validator.validate(predictions, ground_truth)

print(f"Precision: {metrics['precision']}")
print(f"Recall: {metrics['recall']}")
print(f"Lead Time: {metrics['lead_time_days']} days")
```

## 🔧 Configuration

Create a `config.yaml` file:

```yaml
# Data directories
data_dir: data
raw_dir: data/raw
processed_dir: data/processed

# Neo4j configuration
neo4j_uri: bolt://localhost:7687
neo4j_user: neo4j
neo4j_password: your_password

# Collection settings
request_timeout: 30.0
rate_limit_sleep: 1.0

# Clustering settings
clustering_method: dbscan
min_cluster_size: 5

# Prediction settings
risk_threshold: 0.7
confidence_threshold: 0.6
```

Load configuration:

```python
from rota.config import load_config
from pathlib import Path

config = load_config(Path("config.yaml"))
```

## 📚 Documentation

- [Architecture Overview](docs/architecture.md)
- [Data Collection Guide](docs/guides/data-collection.md)
- [Clustering Guide](docs/guides/clustering.md)
- [Prediction Guide](docs/guides/prediction.md)
- [Evaluation Guide](docs/guides/evaluation.md)
- [API Reference](docs/api/)

## 🔬 Research

ROTA is designed for security research with focus on:

- **Temporal Validation**: Prevent data leakage in historical analysis
- **Behavioral Signals**: GitHub activity, commit patterns, issue discussions
- **Multi-Modal Learning**: Code, text, graph, and time-series signals
- **Explainable Predictions**: Understand why vulnerabilities are flagged

### Research Directions

1. **LLM-based Causal Reasoning**: Explain why vulnerabilities occur
2. **Temporal Knowledge Graphs**: Model vulnerability evolution over time
3. **Active Learning**: Efficient project selection for analysis
4. **Multi-Modal Fusion**: Combine code, text, graph, and temporal signals

See [Research Directions](docs/research/directions.md) for details.

## 🤝 Contributing

Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## 📄 License

MIT License - see [LICENSE](LICENSE) for details.

## 📧 Contact

- **Author**: Susie Choi
- **GitHub**: [susie-Choi/rota](https://github.com/susie-Choi/rota)
- **Issues**: [GitHub Issues](https://github.com/susie-Choi/rota/issues)

## 🙏 Acknowledgments

- **NVD**: National Vulnerability Database
- **FIRST**: Forum of Incident Response and Security Teams (EPSS)
- **CISA**: Cybersecurity and Infrastructure Security Agency (KEV)
- **Exploit-DB**: Offensive Security

## 📊 Citation

If you use ROTA in your research, please cite:

```bibtex
@software{rota2025,
  title = {ROTA: Real-time Offensive Threat Assessment},
  author = {Choi, Susie},
  year = {2025},
  url = {https://github.com/susie-Choi/rota}
}
```

---

**ROTA v0.1.2** - Real-time Offensive Threat Assessment
