Experiment: Weight Decay Timing x Mixup Interaction

Weight Decay Timing x Mixup Interaction

Category: Machine Learning

Summary: Testing whether mixup reduces the extra out-of-distribution benefit of applying weight decay late in training.

Mixup and weight decay both tend to smooth learned decision rules, but they act through different training mechanisms. This experiment asks whether mixup already provides some of the same regularizing benefit that late-onset weight decay would otherwise add, reducing the marginal value of turning on weight decay later in training.

The script compares four matched conditions: baseline training, late weight decay, mixup only, and mixup combined with late weight decay. By measuring in-distribution and out-of-distribution performance across repeated trials, it estimates whether the late-weight-decay gain is smaller in the presence of mixup.

That framing turns a practical tuning question into a mechanistic one. If the interaction is strongly negative, the two interventions may be partly substituting for one another rather than addressing independent weaknesses of the model.

Method: Repeated NumPy MLP training runs with a four-condition factorial design crossing late-onset weight decay with mixup augmentation.

What is measured: In-distribution accuracy, out-of-distribution accuracy, generalization gap, confidence, mean mixup lambda, and the interaction effect between mixup and late weight decay.