Experiment: Weight-Decay Width Mechanism

Weight-Decay Width Mechanism

Category: Machine Learning

Summary: Testing why the inverse critical-period effect for weight decay weakens as network width increases.

Axiom previously found a strong inverse critical-period effect for weight decay at narrow width, but newer results suggested that the effect becomes noisy or unreliable in wider models. This experiment asks why that width dependence appears, posing several mechanism-level hypotheses rather than treating the failure as a simple anomaly.

The script compares widths while examining three proposed explanations: wider networks may keep smaller weight norms so late weight decay has little leverage, they may show less sharp rank compression so there is no clear event to time against, or they may close the generalization gap earlier so the useful late window disappears. The predictions tie those hypotheses to measurable norm, rank, and gap trajectories.

That makes the project an explanation test for a previously observed phenomenon. Instead of only checking whether inverse critical-period behavior persists, it looks for the representation and optimization changes that control when the timing effect matters.

Method: Cross-width training comparisons linking weight-decay timing outcomes to weight norms, rank-compression sharpness, and early generalization-gap trajectories.

What is measured: Late-weight-decay effectiveness by width, weight norms at onset time, rank-compression sharpness, early generalization gap, and correlations between those diagnostics and timing effects.