Experiment: Weight-Decay Timing Across Depth

Weight-Decay Timing Across Depth

Category: Machine Learning

Summary: Testing whether the reported inverse critical period for weight decay persists as network depth increases.

Some earlier Axiom experiments suggested an unusual timing effect: applying weight decay later in training could retain most of the benefit of always-on weight decay, while applying it too early could hurt. This experiment asks whether that inverse critical-period pattern is specific to shallow networks or remains visible as depth increases.

Using a fixed width and matched hyperparameters, the script trains compositional-task multilayer perceptrons with depths from two to six hidden layers under different weight-decay timing schedules. It compares the resulting generalization behavior across depths to identify where the timing effect survives, weakens, or disappears.

The result matters because timing-based interventions are only scientifically useful if they generalize beyond one narrow architecture. A depth sweep helps separate a robust training phenomenon from an artifact of a specific small model.

Method: Depth sweep of MLPs on a compositional task, comparing fixed weight-decay schedules under otherwise matched training hyperparameters.

What is measured: Accuracy by depth and schedule, relative strength of the inverse critical-period effect, and depth dependence of the timing response.