Category: Machine Learning
Summary: Testing whether label smoothing changes the gain from applying weight decay late rather than early in training.
Weight decay timing and label smoothing can both change how confident a network becomes, but they may influence the same underlying training dynamics. This experiment asks whether label smoothing weakens, strengthens, or otherwise reshapes the previously observed benefit of turning weight decay on later in training.
The design compares matched conditions with late weight decay, label smoothing, and their combination on the same learning task. By focusing on interaction effects rather than only main effects, the experiment can tell whether these two interventions are partly redundant or whether they alter training in distinct ways.
That distinction matters for mechanism as much as for tuning. If label smoothing already controls the confidence or curvature changes that late weight decay acts on, then the extra benefit of delayed regularization should shrink.
Method: Factorial neural-network training sweeps crossing late-onset weight decay with label smoothing under otherwise matched hyperparameters.
What is measured: In-distribution and out-of-distribution accuracy, generalization gap, confidence or curvature-related behavior, and the interaction effect between label smoothing and weight-decay timing.
