Experiment: Weight-Decay Rebound Dynamics

Weight-Decay Rebound Dynamics

Category: Machine Learning

Summary: Testing whether removing weight decay mid-training causes effective rank to rebound and compositional generalization to deteriorate, explaining the inverse critical-period effect.

Axiom previously observed an inverse critical-period effect in which the timing of weight decay changed final compositional generalization. This experiment asks whether the mechanism is a rebound: once weight decay is removed, the representation rank rises again and the earlier benefit is partially undone.

The script trains networks under no weight decay, always-on weight decay, and several removal times, then tracks effective rank and compositional gap every five epochs for widths 32, 64, and 128. That produces a time-resolved view of whether earlier removal leads to a larger rank rebound and worse final behavior.

The scientific value is mechanistic. Instead of only checking which schedule performs best, the experiment links schedule timing to a measurable internal representation change that could explain the observed generalization pattern.

Method: NumPy MLP training with scheduled removal of weight decay, measuring effective-rank and compositional-gap trajectories across widths.

What is measured: Effective-rank rebound, compositional generalization gap over time, dependence on removal epoch, and correlation between rebound magnitude and final gap.