# Postmortem: Biased Model Training with EEG Signals
##
Persistent model bias observed during training on EEG dataset for wheelchair navigation commands:
- Models consistently best identified `stop/neutral` → moderately performed on `forward/left` → consistently failed on `right`
- Observed across MATLAB Classification Learner models and custom
- Best outcomes: ~50% accuracy for problematic classes ("right") with XGBoost/Bayesian
## Root
1. **Imbalanced Command Representation**
Dataset captured identical durations/sessions per command, ignoring natural variance in EEG pattern
2. **Signal Similarity**
Movements `forward`, `left`, and `right` generate similar frontal lobe (FP1/FP2) activation
3. **Feature Limitation**
Frontal electrodes (FP1/FP2) insufficient to capture directional intent
4. **Data Validation Gap**
Lack of EEG-specific verification methods (e.g. ERP visualization, spectral comparison)
## Why This Happens in Real
- **Physiological Constraints**
Motor-imagery EEG signals exhibit high inter-subject variability and low spatial
- **Sensor Placement Bias**
Frontal electrodes prioritize facial/muscle artifacts over motor cortex
- **Protocol Flaws**
Fixed-duration sessions ignore that complex commands require longer differentiation
- **Preprocessing Blind Spots**
Frequency-domain conversion (FFT) loses temporal dependencies critical for motion
## Real-World
If deployed, the wheelchair would exhibit:
- Erratic navigation behaviors in 52% of
- Catastrophic failure modes:
- Ignoring "stop" commands → collision
- Right-turn misinterpretation → navigational
- User trust erosion after repeated command
## Example or
Problematic feature aggregation approach:
% Typical feature extraction pipeline (simplified)
features = [];
for i = 1:
data = readtable(sprintf(‘command%d/file%d.csv’, command_id, i));
signal = data.Channel1;
% Uses only aggregated
features(i,:) = [mean(signal), std(signal), kurtosis(signal), …];
% Result: Directionally similar commands yield near-identical
Proper spatial feature enhancement:
Baseline correction + topographic mapping (hypothetical improvement)
from scipy.signal import
import numpy as
def enhance_features(signal):
corrected = detrend(signal) # Remove DC
Emulate multi-electrode spatial correlation (requires additional sensors)
spatial_corr = np.correlate(corrected, simulated_motor_cortex_template)
return [np.max(spatial_corr), np.argmax(spatial_corr), …]
## How Senior Engineers Fix
1. **Sensor Expansion**
Add electrodes at C3/C4 (motor cortex) and validate with topographical
2. **Temporal Augmentation**
Record directional commands at 2x duration of neutral
3. **Artifact Subtraction**
Implement EMG/EOG rejection algorithms during
4. **Feature Engineering**
Introduce time-lagged features (e.g. cross-correlation peaks)
5. **Class Reweighting**
Apply cost-sensitive learning with 5x penalty on "right"
6. **Domain Validation**
Generate ERP plots per command to verify neural differentiation
## Why Juniors Miss
- **Data Assumption Gap**
Assumes uniform command difficulty → neglects neurological
- **Toolchain Dependency**
Over-relies on MATLAB's automated workflows (Classification Learner App)
- **Feature Myopia**
Focuses on textbook statistical metrics (mean, kurtosis) ignoring EEG temporal
- **Signal Naivety**
Treats all EEG channels as equally informative → misses regional
- **Validation Oversimplification**
Prioritizes aggregate accuracy over per-class