FGVC-Aircraft famismall custom CNN improves then collapses after stronger augmentation (ColorJitter + RandomErasing). How should I tune next?

Summary

Your CNN improved through moderate augmentation but collapsed once augmentation strength exceeded model capacity. The final run shows classic underfitting due to overly destructive transforms, causing the network to latch onto whatever residual patterns remain (runway edges, borders), which explains the “hotter” Grad‑CAM maps.

Root Cause

The performance drop in Run 5 is driven by augmentation–capacity mismatch:

ColorJitter + RandomErasing + aggressive RandomResizedCrop removed too much discriminative signal.
A small 3‑block CNN lacks the representational power to recover from heavy perturbations.
Weight decay + dropout + strong augmentation created compounded regularization, pushing the model into underfitting.
FGVC‑Aircraft is a fine‑grained dataset, where subtle texture/shape cues matter; heavy augmentation destroys these cues.

Why This Happens in Real Systems

Real ML systems often fail when regularization is increased without considering model capacity:

Fine‑grained tasks rely on high‑frequency details that aggressive augmentation removes.
Small models cannot learn invariances that larger backbones (ResNet‑50, EfficientNet) handle easily.
Over‑regularization causes networks to default to background heuristics, producing hotter Grad‑CAM on irrelevant regions.
When both train and validation accuracy drop together, the system is capacity‑limited, not overfitting.

Real-World Impact

This pattern leads to:

Unstable training where improvements reverse after certain augmentations.
Misleading Grad‑CAM that highlights background because the model cannot extract aircraft‑specific features.
Confusion matrices with weak diagonals, indicating the model is guessing.
Wasted compute on augmentations that harm rather than help.

Example or Code (if necessary and relevant)

Below is an example of a safer augmentation pipeline for fine‑grained classification:

import torchvision.transforms as T

train_tf = T.Compose([
    T.RandomResizedCrop(224, scale=(0.85, 1.0)),
    T.RandomHorizontalFlip(),
    T.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.05),
    T.ToTensor(),
    T.Normalize(mean, std),
])

How Senior Engineers Fix It

Experienced engineers follow a capacity‑first, augmentation‑second strategy:

Increase model capacity before increasing augmentation strength
- Move to ResNet‑18/34 or MobileNetV3 first.
Dial back augmentation until the model learns stable features
- Reduce ColorJitter intensity
- Remove RandomErasing temporarily
- Use less aggressive crop scales (≥0.8)
Tune regularization one dimension at a time
- Freeze weight decay at a small value (1e‑4)
- Adjust dropout only after model capacity is adequate
Use LR scheduling
- Cosine annealing or ReduceLROnPlateau stabilizes fine‑grained tasks.
Validate augmentations visually
- Ensure aircraft shape and texture remain recognizable.
Check Grad‑CAM early
- If attention shifts to borders/background, augmentation is too strong.

Why Juniors Miss It

New practitioners often:

Assume more augmentation = better, without considering task sensitivity.
Underestimate how small CNNs saturate quickly on fine‑grained datasets.
Add multiple regularizers simultaneously, making it impossible to isolate effects.
Misinterpret Grad‑CAM heatmaps, thinking “more red = better,” when it often means the model is confused.
Forget that fine‑grained classification is fundamentally a high‑capacity problem.

If you want, I can outline a concrete next‑run plan tailored to your exact dataset and compute budget.