# ResPSANN Under Compute Parity — Adapted Experiment Plan (Datasets: EAF, Beijing Air, Jena Climate, HAR, Rossmann)

## Scope & Changes

This revision aligns the original plan to the datasets described in the companion data brief. We anchor flagship robustness work on the Industrial Electric Arc Furnace (EAF) tables, use Beijing + Jena for mid‑scale multivariate forecasting and seasonality probes, deploy HAR for classification/representation tests, and include Rossmann for structured business forecasting. Synthetic families remain for stress testing but are de‑emphasized in this pass.

## Datasets & Targets

### 1) Industrial Data from the Electric Arc Furnace (EAF)

**Targets**

* Temperature forecasting: next‑step and short horizon TEMP.
* Oxidation forecasting: VALO2_PPM regression; optionally detection when measured (VALO2_PPM>0).
* Final chemical composition after tapping: multi‑output regression on available chemistry columns (through VALNI).

**Notes**

* Eleven linked CSVs spanning ~2015‑01‑01 → 2018‑07‑30; join on `HEATID`.
* Very large high‑frequency logs for gas/oxygen/carbon; temperature table ~85k rows.
* Decimal commas in numeric fields and timestamps; some duplicate TEMP rows; transformer durations string‑encoded.
* Carbon/gas usage counters accumulate and reset around heat boundaries; final composition file stops at VALNI, so downstream features expecting e.g., VALV/VALTI must be revised.

### 2) Beijing Multi‑Site Air‑Quality

**Targets**

* PM2.5 (primary), optionally PM10/NO2; 1h–6h ahead.

**Notes**

* Hourly data across 12 stations (2013‑03‑01 → 2017‑02‑28); station‑segregated files.
* Hundreds of NA gaps per station; require imputation or masking. Ideal for train/held‑out station generalization.

### 3) Jena Climate 2009–2016

**Targets**

* 6h–24h ahead temperature; optionally multivariate (humidity, pressure).

**Notes**

* 420k ten‑minute records (2009‑01‑01 → 2017‑01‑01) with standard decimals; day‑first timestamps.
* Clean seasonal structure suitable for spectral diagnostics and distribution‑shift splits.

### 4) Human Activity Recognition (HAR) — Smartphones

**Targets**

* 6‑class activity classification (Walking, Upstairs, Downstairs, Sitting, Standing, Laying).

**Notes**

* Two input options: engineered 561‑feature windows (official split), or raw 50 Hz sequences (128×9) from Inertial Signals.
* Respect provided train/test splits by subject to avoid leakage.

### 5) Rossmann Store Sales

**Targets**

* Next‑day sales per store; optional multi‑horizon.

**Notes**

* ~1.0M training rows (2013‑01‑01 → 2015‑07‑31) + test period (2015‑08‑01 → 2015‑09‑17). Join with store metadata; encode holidays; reconcile missing `Open`.

## Preprocessing & Feature Engineering

### EAF

* **Locale normalization:** convert comma decimals in numerics and timestamps.
* **Integrity:** drop exact duplicate TEMP rows; sanitize transformer `DURATION` (parse `HH:MM` variants); align clocks across logs.
* **Heat segmentation:** use `HEATID` to reset cumulative counters; compute per‑heat features (e.g., total oxygen/carbon/gas, average flows, transformer stage time‑shares, mean/max MW, time‑to‑tap).
* **Temporal features:** within‑heat elapsed time, lag features on TEMP/O2, recent flow/usage EMA.
* **Targets:** (a) ΔTEMP next‑step; (b) VALO2_PPM next‑step; (c) final chemistry vector at tap (aggregate inputs to pre‑tap window).

### Beijing

* **Station‑wise normalization** (mean/var per station).
* **Missingness:** forward‑fill short gaps + mask channel; leave long gaps masked; evaluate both imputed vs masked pipelines.
* **Calendar features:** hour‑of‑day, day‑of‑week, month, holiday if available.

### Jena

* **Windowing:** 72–288 step windows for 12–48 h context (ten‑minute sampling).
* **Splits by year** to enforce temporal non‑leakage.
* **Seasonal encodings:** sine/cosine of hour‑of‑day/day‑of‑year; avoid global average pooling in spines when phase matters.

### HAR

* **Option A (engineered):** standardize 561‑D; light PSANN head for classification.
* **Option B (raw):** temporal spine (strided Conv1d or single‑head attention) → PSANN head; label smoothing and class‑balanced sampling.

### Rossmann

* **Joins/encodings:** merge `store.csv`; encode `StateHoliday` strings; derive promo recency and competition distance buckets; handle missing `Open`.
* **Temporal CV:** calendar‑aware splits and synthetic hold‑outs for shift testing.

## Splits & Validation

* **EAF:** Heat‑aware time splits (e.g., 2015–2016 train, 2017 val, early‑2018 test) to test regime drift; ensure complete heats do not straddle boundaries.
* **Beijing:** Train on 10 stations, validate on 1, test on 1 (rotate folds across stations); auxiliary time‑based split within train.
* **Jena:** Train 2009–2014, validate 2015, test 2016 (OOD year); alternate rolling windows for sensitivity.
* **HAR:** Use official train/test by subject; optional 5‑fold CV inside train for tuning under wall‑clock cap.
* **Rossmann:** Train 2013–2015‑07, validate 2015‑08, test 2015‑09; consider rolling‑origin CV for robustness.
* **Seeds:** ≥5 per config; paired tests (Wilcoxon) for significance.

## Models, Baselines & Compute Parity

* **ResPSANN** (primary): parameterized sine activation with residual links; optional **temporal spine** (small strided Conv1d or single‑head attention) that preserves compute budget.
* **Baselines:** matched‑param MLP (ReLU), light 1D‑CNN/TCN, LSTM/GRU (1–2 layers), Transformer‑lite (very shallow, single head).
* **Fairness constraints:**

  * Wall‑clock cap per training run.
  * Parameter (and approximate flop) budget matched across models.
  * Throughput calibration to equalize total optimization work (steps×batch×epochs under time cap).

## Experiments by Hypothesis

### H1 — Generalization under Nonstationarity/High‑Dimensionality

* **EAF‑E1 (TEMP/O2):** one‑step and short‑horizon regression. Compare PSANN vs MLP/CNN/LSTM/TCN; with/without temporal spine.
* **Beijing‑E2 (PM2.5):** cross‑station generalization; multi‑horizon (1–6h) with per‑station normalization.
* **Jena‑E3 (Temp 6–24h):** seasonality‑aware forecasting; test phase sensitivity.

### H2 — Information Usage

* **PSD/SHAP:**

  * EAF: permute exogenous (flows/usages/transformer stages) vs endogenous (past TEMP/O2) to quantify reliance.
  * Beijing: permute meteorology vs pollutant history groups.
  * Rossmann: permute promotions/holidays vs past sales.

### H3 — Spectral/Geometry Diagnostics

* Jacobian/NTK spectra, participation ratio over training for ResPSANN vs MLP on Jena/EAF subsets.
* Frequency response probe with sinusoidal inputs (Jena) to contrast band‑pass characteristics.

### H4 — Robustness to Shift & Missingness

* **EAF:** early vs late heats; simulate sensor dropouts; heavy‑tailed noise injection on TEMP.
* **Beijing:** masked intervals and seasonal distribution shifts (winter vs summer hold‑outs).

### H5 — Limits & Tiny Temporal Inductive Bias

* Show PSANN lagging RNN/TCN on longer memory tasks without a spine; recover with a tiny spine under the same compute budget.

## Metrics & Reporting

* **Forecasting:** RMSE/MAE/R²; plus sMAPE/MASE for comparability.
* **Classification (HAR):** Accuracy, F1‑macro; Expected Calibration Error.
* **Robustness:** ΔR² or ΔMASE vs % missing; noise‑stratified breakdowns.
* **Resources:** wall‑time, params, peak memory; report CIs and paired significance tests.

## Execution Order (Pragmatic Path)

1. Implement locale‑robust loaders + validators for EAF; build per‑heat aggregation pipeline.
2. Run EAF‑E1 baseline sweep under compute parity; prototype PSD on grouped features.
3. Stand up Beijing station‑generalization with missingness handling; run parity sweep.
4. Run Jena seasonal forecasting and geometry diagnostics.
5. HAR classification (engineered first; then raw + spine).
6. Rossmann business case; finalize PSD framework on categorical+temporal mix.
7. Aggregate results, stats, and figures; write‑up.

## Artifacts & Reproducibility

* Versioned notebooks/scripts per dataset with checksums and schema asserts.
* Saved splits, seeds, model configs, and result bundles (CSV/JSON).
* Figure scripts for PSD/SHAP, geometry, and robustness plots.
* Environment snapshot and wall‑clock calibration sheet (per‑model).

---

**Outcome:** This plan maps the original compute‑parity PSANN agenda directly onto the available datasets, emphasizing EAF for robustness/nonstationarity, Beijing+Jena for multivariate temporal structure, HAR for classification, and Rossmann for structured tabular forecasting. It preserves fairness constraints and the hypothesis‑driven diagnostics (PSD/SHAP, spectral geometry) while tailoring preprocessing and splits to each dataset’s quirks.
