{% if favicon_base64 %} {% endif %}
Uncertainty Analysis Report
| Type: | {{ model_type|default('Unknown') }} |
| Features: | {{ features|length|default(0) }} |
| Primary Metric: | {{ metric|default('Accuracy')|upper }} |
| Sensitive Features: | {{ sensitive_features|length|default(0) }} |
| Alternative Models: | {{ report_data.alternative_models|length|default(0) }} |
| Uncertainty Score: | {{ (uncertainty_score if uncertainty_score is not none else 0)|round(4) }} |
| Coverage: | {{ (coverage if coverage is not none else 0)|round(4) }} |
| Mean Width: | {{ (mean_width if mean_width is not none else 0)|round(4) }} |
| Calibration Size: | {{ cal_size }} |
| Generation Time | {{ timestamp }} |
|---|---|
| Sensitive Features | {{ sensitive_features|join(', ') }} |
| Metric | {{ metric|default('Accuracy') }} |
| Report Type | Static (non-interactive) |
| Model | Uncertainty Score | Coverage | Mean Width | {% if metrics %} {% for metric_name in metrics|sort %} {% if metric_name not in ['uncertainty_score', 'coverage', 'mean_width'] %}{{ metric_name|title }} | {% endif %} {% endfor %} {% endif %}
|---|---|---|---|---|
| {{ model_name }} | {{ "%.4f"|format(uncertainty_score if uncertainty_score is not none else 0) }} | {{ "%.4f"|format(coverage if coverage is not none else 0) }} | {{ "%.4f"|format(mean_width if mean_width is not none else 0) }} | {% if metrics %} {% for metric_name, metric_value in metrics.items() %} {% if metric_name not in ['uncertainty_score', 'coverage', 'mean_width'] %}{{ "%.4f"|format(metric_value if metric_value is not none else 0) }} | {% endif %} {% endfor %} {% endif %}
| {{ alt_model_name }} | {{ "%.4f"|format(alt_model_data.uncertainty_score if alt_model_data.uncertainty_score is not none else 0) }} | {{ "%.4f"|format(alt_model_data.coverage if alt_model_data.coverage is not none else 0) }} | {{ "%.4f"|format(alt_model_data.mean_width if alt_model_data.mean_width is not none else 0) }} | {% if alt_model_data.metrics %} {% for metric_name, metric_value in alt_model_data.metrics.items() %} {% if metric_name not in ['uncertainty_score', 'coverage', 'mean_width'] %}{{ "%.4f"|format(metric_value if metric_value is not none else 0) }} | {% endif %} {% endfor %} {% endif %}
Compares uncertainty metrics across different models or model configurations.
Compares actual coverage with expected coverage at different alpha (confidence) levels. Closer to the diagonal line indicates better calibration.
{% if charts.coverage_vs_expected %}Shows the relationship between interval width and coverage. Efficient uncertainty estimates achieve higher coverage with narrower intervals.
{% if charts.width_vs_coverage %}Shows gaps between expected and actual coverage at different alpha levels. Values close to zero indicate well-calibrated uncertainty.
{% if charts.performance_gap_by_alpha %}Shows key uncertainty metrics for the model, including uncertainty score, coverage, and mean width.
{% if charts.uncertainty_metrics %}Shows the most important features affecting model uncertainty. Features with higher importance have greater impact on prediction intervals.
Shows feature reliability scores, indicating which features are most consistent in their impact on uncertainty quantification.
Shows how well calibrated the model's predicted probabilities are by comparing predicted probabilities with actual observed frequencies. The diagonal line represents perfect calibration. Bands show confidence intervals.
Shows how model performance varies with prediction confidence levels. This helps identify regions where the model is more or less reliable. Higher confidence should correlate with better performance.
Shows the distribution of prediction interval widths across the dataset. Narrower intervals with proper coverage indicate more efficient uncertainty quantification.
Compares the distribution of prediction interval widths across different confidence levels (alpha values). Shows both boxplot and violin plot representations for detailed analysis of interval width distributions.
Shows the PSI values for each feature, indicating distribution shifts between reliable and unreliable predictions. Features with PSI ≥ 0.25 show significant distribution changes that may impact model reliability.
Compares the distributions of features with highest PSI values between reliable and unreliable predictions. Shows histograms with KDE overlays to visualize how these critical features differ between the two groups.
Population Stability Index (PSI) scores measure the stability of feature distributions between calibration and test sets.
| Feature | PSI Score |
|---|---|
| {{ feature }} | {{ "%.4f"|format(psi) }} |
| Feature | Importance |
|---|---|
| {{ feature }} | {{ "%.4f"|format(importance) }} |
Shows the distribution of residuals (prediction errors) across different datasets, helping identify biases under stress conditions.
Shows which features are most correlated with model errors, helping identify potential areas for model improvement.
Compares different distance metrics (PSI, WD1, KS, etc.) across alpha levels, showing how distribution shift is captured by different metrics.
Shows the distribution shift of each feature as measured by different metrics, visualizing which features are most affected by different types of distribution shifts.
Compares resilience performance across different models under increasing stress levels. Models with more gradual decline are more resilient.
Compares the overall resilience score for each model. Higher scores indicate better performance under distribution shifts.
Compares different distance metrics (PSI, WD1, KS, etc.) across alpha levels, showing how distribution shift is captured by different metrics.