Jena Climate Dataset
Context
Jena Climate is weather timeseries dataset recorded at the Weather Station of the Max Planck Institute for Biogeochemistry in Jena, Germany.

Content
Jena Climate dataset is made up of 14 different quantities (such air temperature, atmospheric pressure, humidity, wind direction, and so on) were recorded every 10 minutes, over several years. This dataset covers data from January 1st 2009 to December 31st 2016.


Human Activity Recognition

Human Activity Recognition (HAR) using smartphones dataset and an LSTM RNN. Classifying the type of movement amongst six categories:

WALKING,
WALKING_UPSTAIRS,
WALKING_DOWNSTAIRS,
SITTING,
STANDING,
LAYING.

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used.

Industrial Data from the Electric Arc Furnace

1. General design of an Electric Arc Furnace (EAF)
Basic components of modern electric arc furnaces include:

mechanical frame;
electric circuits;
equipment to deliver process gases, powdery and bulk materials into the working chamber;
process waste removal and gas scrubbing system;
automated process control system.
Modern electric arc furnaces are built with the following structural elements:

foundation;
tilt platform;
furnace body;
roof;
graphite electrodes;
electrode arms;
lifting and rotating mechanism for the roof and electrodes;
operating door.
2. EAF basic specifications
2.1 Electric arc furnace includes the following components:
refractory-lined lower casing with an eccentric bottom tapping system;
upper casing with water-cooled panels;
roof (large water-cooled and small uncooled) with refractory concrete lining.
2.2 Basic technical and performance specifications of DSP-120 electric arc furnace are as follows:
Rated furnace size — 142 m3
Furnace capacity — 140 tons
Rated tapping heat size — 125 tons
Furnace heel — 15-20 tons (± 3)
Transformer power — 120 MVA
Heat tapping type — eccentric bottom tapping
Tapping hole diameter — 180 mm
Operating door size — 1,000x1,200 mm
Transformer operating stages — 17
Primary voltage — 35 kV ± 10%
Roof lift — 400 mm
Roof speed — 40 mm/s
Furnace tilt angle for steel tapping — Max 15°
Furnace tilt angle for slag skimming — Max 8°
Maximum secondary voltage — 1,250 V
Rated secondary current — 70 kA
Electrode pitch circle diameter — 1,300 mm
Inner hearth diameter — 7,100 mm
Inner casing diameter — 7,300 mm
Electrode diameter — 610 mm
Electrode length — 2,400 mm
2.3 EAF gas-oxygen modules.
The system of gas-oxygen modules (burners) includes 4 multi-fuel gas-oxygen burners positioned inside vertical water-cooled panels.
The multi-fuel gas-oxygen burners can operate in burner mode to heat and melt the bulk charge, as well as in supersonic oxygen injection mode for bath lancing to trim the charge and foam the slag during the refining period.

2.4 Arrangement for injection of powdery carbon-containing materials into the EAF.
This arrangement enables:

partial deoxidation of furnace slag;
slag foaming and foam maintenance to protect the lining from arc heat and stabilize the lining.
Three carbon injectors are built into the side panels of the furnace to inject carbon-containing materials (CCM). The injectors are installed above the threshold level to keep them safe from any damage. Continuous purging with compressed air prevents the injectors from clogging with slag or metal splash.

Technical specifications of the arrangement for injection of powdery carbon-containing materials (fine coke, 0-3 mm crushed graphite) are as follows:

Loading hopper size — 50 m3
Flow rate per injector — 1 × 15-25 kg/min
Delivery medium — compressed air
Carrier gas pressure — 4-6 MPa
2.5 Equipment for feeding ferrous alloys and bulk materials. Equipment used to store and feed materials into the EAF and pouring ladle includes a set of receiving hoppers, storage hoppers, intermediate weighing hoppers, feeders, and conveyor belts.
3. EAF semi-product smelting process
First, the charge is filled into EAF. Melting of the charge begins at the lowest voltage levels with a minimum arc length. The voltage level is then increased. The arc energy melts the metal charge and slagging materials, heating the metal to tapping temperature and balancing the heat losses. Arc power and length are adjusted by selecting the appropriate transformer stage.
Slagging materials, additions, deoxidizers, and other materials are delivered to the furnaces by a power-driven transport hopper system.
To enable dephosphorization and make the furnace lining more durable, the slag in the EAF must be highly basic and magnesial. Magnesium-lime flux is added to the furnace mix to obtain a 7-10% range of MgO content in the slag.
Carbon-containing material is fed into the furnace using a hopper system to stabilize the arc, carbonize the metal in the furnace, and foam the slag (if carbon injectors malfunction).
To protect the walls and roof of the arc furnace from the arc heat, maximum shielding of the arc with slag is used during the charge melting and oxidation period. Slagging materials are added in measured portions through the top (roof) feeding hole to maintain the desired composition of the slag in accordance with the melting energy process conditions.

4. Oxidation period
Goals of the oxidation period:

oxidize carbon and generate additional chemical energy followed by heat release (in addition to supplied electricity) for smelting operations due to exothermic reactions;
remove phosphorus down to values that ensure desired chemical composition due to slag introduction;
ensure boiling and mixing of metal thanks to production of carbon monoxide, homogenize metal over temperature, and prevent saturation of metal with nitrogen and hydrogen thanks to foamed slag and arc shielding.
The oxidation period begins after complete melting of the charge and achievement of a “flat” bath. During this period, a sample is drawn to control the chemical composition of metal and measure temperature with MORE automatic unit.
Oxidation of impurities in the molten mass is achieved by purging the bath with gaseous oxygen through the multi-fuel gas-oxygen modules.
During the oxidation period, foaming of the slag by injecting carbon-containing material (CCM) through carbon injector is required with the purpose of shielding the arc to reduce saturation of the metal with gases and ensure complete transfer of the arc energy into the metal “bath”.
At the end of the oxidation period, shopfloor manager decides whether or not to tap the melt heat, based on the findings of chemical analysis of the metal to determine carbon content carried out by an express test lab, or based on the carbon content measured by Multi-Lab III Celox device, oxidation and heating of the metal to the required tapping temperature.
Oxidation of the metal should be measured before tapping the melt heat.

5. Tapping of the melt heat
By the end of the tapping, the weight of metal in the ladle varies from 120 to 125 tons (as measured by ladle car scales).
The metal is tapped into the ladle with maximum cut-off of oxidizing furnace slag. If furnace slag gets into the steel ladle when the melt heat is tapped, the foaming slag should be deposited by releasing aluminum pellets or "ingots".
Once the heat is tapped and the ladle with metal is removed from under the furnace, the metal in the ladle must be subject to a compulsory “soft” purging with argon for 1-3 minutes.

6. Deoxidation and alloying of metal
Deoxidizers and ferrous alloys are released into the ladle from the supply bins during the melt heat tapping process.
The delivery rate of deoxidizers and alloying agents (A), tons, is calculated based on the average content of the element in the finished steel using the following formula:

     А = ((B – C) × D × 100) / (E × R),                 (1)
where:
A – weight of ferrous alloy, tons;
B – mean content of element in finished steel, %;
C – content of element in steel prior to deoxidation, %;
D – weight of metal, including metal from previous melt, tons;
E – content of deoxidizer element in ferrous alloys, %;
R – recovery of deoxidizer element, %.

7. Metal temperature and oxidation degree control
The initial measurement of metal temperature and oxidation degree is carried out immediately after a flat bath is achieved and 42-46 MW of power is used up.
Readings of metal temperature and oxidation degree before the melt heat tapping are recorded in the melting chart. The time from the latest temperature measurement to the start of melt heat tapping should not exceed 3 minutes.
Intermediate measurement of the metal temperature is recommended (depending on the heating stage).

8. Description of data files
1 Chemical measurements in EAF (eaf_final_chemical_measurements.csv)

HEATID – heat identification number
POSITIONROW – measurement number
DATETIME – measurement date and time
VALC, VALSI, VALMN, VALP, VALS, VALCU, VALCR, VALMO, VALNI, VALAS, VALSN, VALN, VALZN – values of chemical elements, %.
Note: not all heats have details on chemical composition in EAF.
2 Measurements of temperature and oxidation degree in EAF (eaf_temp.csv)

HEATID – heat identification number
DATETIME – measurement date and time
TEMP – temperature, °C
VALO2_PPM – oxidation degree, ppm
Note: if oxidation degree is 0, it means no oxidation measurement was done (only temperature measurement)
3 Initial chemical measurement at ladle furnace (lf_initial_chemical_measurements.csv)

HEATID – heat identification number
POSITIONROW – measurement number
DATETIME – measurement date and time
VALC, VALSI, VALMN, VALP, VALS, VALAL, VALCU, VALCR, VALMO, VALNI, VALV, VALTI, VALNB, VALCA, VALW, VALB, VALAS, VALSN, VALN – , %.
4 Additions at EAF tapping (ladle_tapping.csv)

HEATID – heat identification number
MAT_CODE – material code
MAT_DEC – material name
CHARGE_AMOUNT – weight of addition
DATETIME – date and time of addition
5 Loading furnace from the basket (basket_charged.csv)

HEATID – heat identification number
MAT_CODE – material code
MAT_DEC – material name
CHARGE_AMOUNT – weight of addition
DATETIME – date and time of addition
6 Additional charge and loading of furnace (eaf_added_materials.csv)

HEATID – heat identification number
MAT_CODE – material code
MAT_DEC – material name
CHARGE_AMOUNT – weight of addition
DATETIME – date and time of addition
7 Additions to ladle furnace before initial chemical measurement (lf_added_materials.csv)

DATETIME – date and time of addition
HEATID – heat identification number
MAT_CODE – material code
DESCR – material name
MAT_CHARGED – weight of addition
8 EAF transformer data (eaf_transformer.csv)

TAP – transformer stage
HEATID – heat identification number
STARTTIME – electrode operation start time
DURATION – electrode operation duration
MW – electricity consumption
9 Usage of injected carbon in EAF (inj_mat.csv)

REVTIME – date and time
INJ_AMOUNT_CARBON – carbon usage amount
INJ_FLOW_CARBON – carbon injection flow rate
HEATID – heat identification number
Note: Usage of carbon for smelting starts at zero and adds up. Please mind that it may take some time to reset carbon usage for new melt heat.
10 Gas and oxygen usage in EAF (eaf_gaslance_mat.csv)

REVTIME – date and time
O2_AMOUNT – oxygen usage amount
GAS_AMOUNT – gas usage amount
O2_FLOW – oxygen flow rate
GAS_FLOW – gas flow rate
HEATID – heat identification number
Note: Usage of oxygen and gas for smelting starts at zero and adds up. Please mind that it may take some time to reset carbon usage for new melt heat.
11 Materials description with the values of chemical elements in it (ferro.csv)

9. Main problems
Temperature forecasting (target is in the "eaf_temp.csv" file);
Oxidation of steel forecasting (target is in the "eaf_temp.csv" file);
Chemical composition of steel after tapping steel from an electric arc furnace (target is in the "eaf_final_chemical_measurements.csv" file).

Beijing Multi-Site Air-Quality Data Set

About Dataset
Context
PM2.5 readings are often included in air quality reports from environmental authorities and companies. PM2.5 refers to atmospheric particulate matter (PM) that have a diameter less than 2.5 micrometers. In other words, it's used as a measure of pollution.

Content
This data set includes hourly air pollutants data from 12 nationally-controlled air-quality monitoring sites. The air-quality data are from the Beijing Municipal Environmental Monitoring Center. The meteorological data in each air-quality site are matched with the nearest weather station from the China Meteorological Administration. The time period is from March 1st, 2013 to February 28th, 2017.


Beijing Air Quality

12 station files under datasets/Beijing Air Quality/ each hold 35 064 hourly samples (2013‑03‑01 → 2017‑02‑28) with the standard 18-column pollutant & meteorology schema.
PM2.5 shows 382–953 'NA' gaps per station (similar patterns across the other pollutants), so we’ll need imputation or masking before forecasting.
Station-segregated layout is ideal for cross-site generalization studies (train vs held-out stations) to exercise H1/H4 on moderate data volume (~420k rows total).
Plan on per-station normalization and possibly a lightweight temporal spine to capture hourly/seasonal structure while keeping compute parity.
Human Activity Recognition

train/X_train.txt and test/X_test.txt provide 7 352 / 2 947 windowed samples with 561 engineered features; labels live in y_*.txt, subjects in subject_*.txt.
Raw 50 Hz sequences (128 steps × 9 channels) are in */Inertial Signals/*.txt, giving us the option to test PSANN temporal spines against the fixed features.
Dataset already split by volunteer per the README, so we should respect the supplied train/test partitions to avoid leakage.
Strong candidate for H1/H2 classification benchmarks and PSD/SHAP attribution comparisons under high-dimensional, noisy inputs.
Industrial Data from the Electric Arc Furnace

Eleven linked tables in datasets/Industrial Data from the Electric Arc Furnace/ cover 2015‑01‑01 → 2018‑07‑30; key sizes: eaf_gaslance_mat.csv 5 748 194 rows, inj_mat.csv 4 011 646, eaf_temp.csv 85 104, chemistry tables 3 709–20 827 rows.
Numeric fields use comma decimals (e.g. "0,0545") and timestamps carry decimal commas ("2016-01-01 18:31:46,003"); everything must be locale-normalized before modeling.
Join key is HEATID; note column quirks (lf_added_materials.csv uses FILTER_KEY_DATE, eaf_transformer.csv encodes durations as " 10: 40", eaf_temp.csv contains duplicate measurement rows).
Final composition file only includes columns through VALNI, so plan variants that expect VALV/VALTI/etc need revision.
This is the flagship dataset for H1/H4/H5 (nonstationary, high-noise forecasting); we’ll likely aggregate per heat or down-sample to stay within Colab GPU memory and compute budgets.
Jena Climate 2009–2016

datasets/Jena Climate 2009–2016/jena_climate_2009_2016.csv has 420 551 ten-minute measurements (2009‑01‑01 00:10 → 2017‑01‑01 00:00) across 15 continuous weather variables.
Uses day-first timestamps but standard decimal points, so parsing is straightforward once the format is specified.
Great medium-scale multivariate series for seasonal forecasting, spectral diagnostics (H3), and distribution-shift splits (e.g. early vs late years).
Data volume is light enough to run multiple baselines per compute-parity budget in Colab.
Kaggle Rossmann Store Sales

train.csv (1 017 209 rows, 2013‑01‑01 → 2015‑07‑31) includes sales, customers, promos; test.csv adds 41 088 rows (2015‑08‑01 → 2015‑09‑17) with 11 missing Open.
store.csv (1 115 stores) supplies static descriptors—3 missing CompetitionDistance, 571 stores flagged Promo2=1; join on Store.
Supports structured regression experiments for H1/H2 with rich categorical + temporal covariates; remember to encode StateHoliday strings and reconcile missing Open.
Recommend calendar-based cross-validation or synthetic holdouts to induce distribution shifts for robustness testing.
Planning Notes

Anchor H1/H4/H5 around the industrial furnace data, using per-heat aggregation and careful decimal parsing to keep runs tractable in Colab.
Use Beijing + Jena for mid-scale multivariate forecasting (seasonality, cross-station transfer) and for spectral/geometry probes tied to H2/H3.
Deploy HAR as the primary classification benchmark to stress-test information usage metrics (PSD/SHAP) and optional temporal spines.
Rossmann offers a structured business-forecasting case; align compute budgets across PSANN vs MLP/CNN/sequence-lite baselines and address the few missing fields before training.