Experiment: Weight-Decay Width Transition

Weight-Decay Width Transition

Category: Machine Learning

Summary: Finding the network width where the previously observed inverse critical period for weight decay stops holding.

Earlier experiments indicated that late weight decay could remain highly effective at one narrow width, while newer results suggested the effect may fail in wider models. This experiment asks for the transition point: at what width does the inverse critical-period behavior break down when the training setup is otherwise kept fixed.

The script repeats the same compositional-learning protocol across a targeted width sweep, using a weight-decay strength known to produce the effect in smaller models. Alongside performance, it records mechanistic diagnostics such as singular-value-based rank and weight norms to see whether the timing transition coincides with a change in internal representation geometry.

That gives the project a finite-boundary flavor rather than a yes-or-no framing. Instead of debating whether the timing effect is real in general, the experiment tries to locate its architectural range of validity.

Method: Matched-hyperparameter width sweep comparing weight-decay timing schedules, with checkpointed rank and norm diagnostics on a compositional task.

What is measured: Accuracy by width and timing schedule, transition width for the inverse critical-period effect, singular-value rank measures, and weight norms.