Experiment: Weight Decay Onset Sweep

Weight Decay Onset Sweep

Category: Machine Learning

Summary: Mapping the full landscape of when weight decay should start, and testing whether its effect changes sharply from harmful to helpful at a critical training epoch.

Axiom’s earlier work suggested that regularization timing can matter as much as regularization strength, with early weight decay sometimes hurting compositional learning while later weight decay helps. This experiment turns that observation into a systematic timing map, asking whether there is a genuine transition epoch where the effect changes sign.

The script trains compositional-task networks across many candidate weight-decay onset epochs and several widths, then analyzes the resulting generalization gaps, best-onset schedules, and rank trajectories. It also checks whether the onset transition aligns with the point of sharpest representation change in the network.

That framing matters because it tests whether the timing effect is structural rather than anecdotal. A sharp transition would suggest that regularization interacts with a specific stage of representation formation, not just with total optimization time.

Method: Systematic MLP sweeps over weight-decay start epoch and network width on a compositional classification task, followed by transition and rank-correlation analysis.

What is measured: Generalization gap by onset epoch, best onset timing, inferred transition epoch, width dependence, always-on versus timed weight-decay comparison, and correlation with sharpest rank-change epoch.