# Release Monitoring Guide

This guide explains how to monitor Riveter releases, download statistics, and build status using the automated monitoring system.

## Overview

The monitoring system provides:

- **Download Statistics**: Track binary download counts by platform and version
- **Build Status Monitoring**: Monitor success/failure rates of binary builds and releases
- **Automated Alerting**: Send notifications when builds fail or issues are detected
- **Dashboard Generation**: Create visual summaries of release metrics

## Components

### Monitoring Script

The main monitoring functionality is provided by `scripts/monitor_releases.py`:

```bash
# Get download statistics
python scripts/monitor_releases.py download-stats --version 1.2.3

# Check build status
python scripts/monitor_releases.py check-builds --days 7

# Generate dashboard
python scripts/monitor_releases.py dashboard

# Send alerts
python scripts/monitor_releases.py alert-failures --webhook-url https://hooks.slack.com/...
```

### Automated Workflow

The monitoring workflow (`.github/workflows/monitoring.yml`) runs automatically:

- **Schedule**: Every 6 hours
- **Manual Trigger**: Can be triggered manually with custom parameters
- **Artifacts**: Saves monitoring data and dashboard files

### Data Storage

Monitoring data is stored in the `monitoring/` directory:

- `download_stats.json` - Historical download statistics
- `build_status.json` - Build status history
- `dashboard.json` - Latest dashboard data
- `dashboard.html` - Generated HTML dashboard

## Download Statistics

### Collecting Stats

```bash
# Get stats for latest release
python scripts/monitor_releases.py download-stats

# Get stats for specific version
python scripts/monitor_releases.py download-stats --version 1.2.3

# Save stats to file
python scripts/monitor_releases.py download-stats --save
```

### Stats Format

```json
{
  "version": "v1.2.3",
  "published_at": "2024-01-15T10:30:00Z",
  "total_downloads": 1250,
  "platforms": {
    "macos-intel": {
      "total_downloads": 450,
      "assets": [
        {
          "name": "riveter-1.2.3-macos-intel.tar.gz",
          "downloads": 400,
          "size": 15728640
        },
        {
          "name": "riveter-1.2.3-macos-intel.tar.gz.sha256",
          "downloads": 50,
          "size": 89
        }
      ]
    }
  }
}
```

### Key Metrics

- **Total Downloads**: Aggregate downloads across all platforms
- **Platform Breakdown**: Downloads by platform (macOS Intel, macOS ARM64, Linux x86_64)
- **Asset Details**: Individual file download counts and sizes
- **Growth Tracking**: Historical comparison of download trends

## Build Status Monitoring

### Checking Build Status

```bash
# Check last 7 days
python scripts/monitor_releases.py check-builds --days 7

# Check last 30 days
python scripts/monitor_releases.py check-builds --days 30

# Save status to file
python scripts/monitor_releases.py check-builds --save
```

### Status Categories

- **Successful**: Builds that completed successfully
- **Failed**: Builds that failed due to errors
- **In Progress**: Currently running builds
- **Cancelled**: Manually cancelled builds

### Monitored Workflows

- **Binary Builds** (`release-binaries.yml`): Multi-platform binary creation
- **Releases** (`release.yml`): Full release process including PyPI publishing

### Failure Tracking

Failed builds include:
- Run ID and timestamp
- Commit SHA
- Direct link to workflow run
- Failure categorization

## Dashboard System

### Generating Dashboard

```bash
# Generate dashboard with current data
python scripts/monitor_releases.py dashboard
```

### Dashboard Contents

1. **Release Summary**
   - Latest version information
   - Total download counts
   - Platform distribution

2. **Build Status**
   - Success/failure rates
   - Recent build trends
   - Failure details

3. **Active Alerts**
   - Build failures
   - Download anomalies
   - System issues

### HTML Dashboard

The system generates an HTML dashboard (`monitoring/dashboard.html`) with:
- Interactive charts and graphs
- Real-time data updates
- Mobile-responsive design
- Export capabilities

## Alerting System

### Alert Types

1. **Build Failures** (High Severity)
   - Binary build failures
   - Release process failures
   - Workflow errors

2. **Download Anomalies** (Medium Severity)
   - Zero downloads for new releases
   - Significant download drops
   - Platform-specific issues

3. **System Issues** (Medium Severity)
   - API access problems
   - Data collection failures
   - Dashboard generation errors

### Webhook Integration

#### Slack Integration

```bash
# Send alerts to Slack
python scripts/monitor_releases.py alert-failures \
  --webhook-url https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK
```

#### Discord Integration

```bash
# Send alerts to Discord
python scripts/monitor_releases.py alert-failures \
  --webhook-url https://discord.com/api/webhooks/YOUR/DISCORD/WEBHOOK
```

#### Custom Webhooks

The system sends JSON payloads compatible with most webhook services:

```json
{
  "text": "🚨 Riveter Release Monitoring Alert\n🔴 2 binary build(s) failed in the last 7 days\n  • Run ID: 123456 - https://github.com/riveter/riveter/actions/runs/123456"
}
```

### Setting Up Alerts

1. **Create Webhook URL**:
   - Slack: Create an incoming webhook in your Slack workspace
   - Discord: Create a webhook in your Discord server
   - Custom: Set up your webhook endpoint

2. **Configure Repository Secret**:
   - Go to repository Settings → Secrets and variables → Actions
   - Add `MONITORING_WEBHOOK_URL` with your webhook URL

3. **Test Alerts**:
   ```bash
   # Manual test
   python scripts/monitor_releases.py alert-failures \
     --webhook-url YOUR_WEBHOOK_URL
   ```

## Automated Monitoring

### Workflow Configuration

The monitoring workflow runs automatically with these settings:

```yaml
on:
  schedule:
    # Run every 6 hours
    - cron: '0 */6 * * *'
  workflow_dispatch:
    # Manual trigger with options
```

### Manual Triggers

You can manually trigger monitoring with custom parameters:

1. Go to Actions → Release Monitoring
2. Click "Run workflow"
3. Configure options:
   - Send alerts: Force alert sending
   - Webhook URL: Override default webhook
   - Days to check: Custom time range

### Workflow Outputs

Each run produces:
- **Download Stats Artifact**: Latest download statistics
- **Build Status Artifact**: Recent build status data
- **Dashboard Artifact**: Generated dashboard files
- **Summary**: Workflow execution summary

## Data Analysis

### Historical Trends

```bash
# Analyze download trends
python -c "
import json
from pathlib import Path

stats_file = Path('monitoring/download_stats.json')
if stats_file.exists():
    with open(stats_file, 'r') as f:
        data = json.load(f)

    print('Download Trends:')
    for entry in data[-5:]:  # Last 5 entries
        print(f'{entry[\"version\"]}: {entry[\"total_downloads\"]:,} downloads')
"
```

### Build Success Rates

```bash
# Calculate success rates
python -c "
import json
from pathlib import Path

build_file = Path('monitoring/build_status.json')
if build_file.exists():
    with open(build_file, 'r') as f:
        data = json.load(f)

    if data:
        latest = data[-1]
        binary = latest['binary_builds']
        total = binary['total']
        success = binary['successful']

        if total > 0:
            rate = (success / total) * 100
            print(f'Binary Build Success Rate: {rate:.1f}% ({success}/{total})')
"
```

### Platform Analysis

```bash
# Analyze platform distribution
python -c "
import json
from pathlib import Path

stats_file = Path('monitoring/download_stats.json')
if stats_file.exists():
    with open(stats_file, 'r') as f:
        data = json.load(f)

    if data:
        latest = data[-1]
        platforms = latest.get('platforms', {})
        total = latest.get('total_downloads', 0)

        print('Platform Distribution:')
        for platform, stats in platforms.items():
            downloads = stats['total_downloads']
            percentage = (downloads / total * 100) if total > 0 else 0
            print(f'{platform}: {downloads:,} ({percentage:.1f}%)')
"
```

## Troubleshooting

### Common Issues

#### "Failed to fetch release info"

1. **Check GitHub API limits**: Ensure you have a valid GitHub token
2. **Verify repository access**: Confirm the repository exists and is accessible
3. **Network connectivity**: Check internet connection and firewall settings

#### "No download stats available"

1. **Release exists**: Verify the specified version exists
2. **Assets present**: Confirm the release has binary assets
3. **API permissions**: Ensure token has read access to releases

#### "Build status check failed"

1. **Workflow permissions**: Verify token has actions:read permission
2. **Workflow names**: Confirm workflow file names match expectations
3. **Date range**: Check if the specified date range includes any builds

### Debug Mode

Enable debug mode for detailed logging:

```bash
# Debug download stats
python scripts/monitor_releases.py download-stats --debug

# Debug build status
python scripts/monitor_releases.py check-builds --debug

# Debug alerts
python scripts/monitor_releases.py alert-failures --webhook-url URL --debug
```

### Log Analysis

Check workflow logs for detailed error information:

1. Go to Actions → Release Monitoring
2. Click on the failed run
3. Expand the failed job
4. Review the step logs

## Best Practices

### Monitoring Frequency

- **Download Stats**: Every 6-12 hours (API rate limit considerations)
- **Build Status**: Every 1-6 hours (more frequent during active development)
- **Alerts**: Immediate for failures, daily summaries for trends

### Data Retention

- **Raw Data**: Keep 30-90 days of detailed statistics
- **Summaries**: Keep 1 year of aggregated data
- **Alerts**: Keep 30 days of alert history

### Alert Management

1. **Severity Levels**: Use appropriate severity for different alert types
2. **Alert Fatigue**: Avoid too frequent alerts for the same issue
3. **Escalation**: Set up escalation paths for critical failures
4. **Documentation**: Document common alert causes and resolutions

### Performance Optimization

1. **API Efficiency**: Use GitHub tokens to increase rate limits
2. **Data Caching**: Cache frequently accessed data
3. **Batch Operations**: Combine multiple API calls when possible
4. **Error Handling**: Implement robust error handling and retries

## Integration Examples

### CI/CD Integration

```yaml
# Add monitoring to release workflow
- name: Update monitoring data
  run: |
    python scripts/monitor_releases.py download-stats --save
    python scripts/monitor_releases.py check-builds --save
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
```

### Custom Dashboards

```python
# Custom dashboard integration
import json
from pathlib import Path

def get_latest_stats():
    stats_file = Path('monitoring/download_stats.json')
    if stats_file.exists():
        with open(stats_file, 'r') as f:
            data = json.load(f)
            return data[-1] if data else None
    return None

# Use in your dashboard application
stats = get_latest_stats()
if stats:
    print(f"Latest release: {stats['version']}")
    print(f"Total downloads: {stats['total_downloads']:,}")
```

### Notification Integration

```python
# Custom notification handler
def send_custom_notification(alert_data):
    alerts = alert_data.get('alerts', [])

    for alert in alerts:
        if alert['severity'] == 'high':
            # Send to PagerDuty, email, etc.
            send_critical_alert(alert)
        else:
            # Send to Slack, Discord, etc.
            send_info_alert(alert)
```

## Related Documentation

- [Token Management Guide](TOKEN_MANAGEMENT.md)
- [Release Workflow](RELEASE_WORKFLOW.md)
- [Security Setup](SECURITY_SETUP.md)
- [Homebrew Tap Setup](HOMEBREW_TAP_SETUP.md)
