Experiment: Weight Decay Timing and Curriculum Interaction

Weight Decay Timing and Curriculum Interaction

Category: Machine Learning

Summary: Testing whether the benefit of applying weight decay late in training depends on whether examples are presented in curriculum or anti-curriculum order.

Weight decay often helps generalization, but it may matter not only how strongly it is applied, but also when it is introduced. This experiment asks whether the value of late weight decay changes when the training data are ordered from easy to hard rather than in an anti-curriculum sequence.

The script compares training runs under different sample-order schedules and weight-decay timings. Its central hypothesis is that curriculum ordering may stabilize useful features earlier, which would reduce the marginal benefit of turning on weight decay later in training.

That makes the project a timing-and-order interaction study rather than a simple regularization benchmark. The broader question is whether optimization history and data presentation jointly determine when regularization is most effective.

Method: Repeated neural-network training runs comparing curriculum order against anti-curriculum order under different weight-decay timings.

What is measured: Final accuracy, generalization gap, dependence on curriculum order, and the size of the late-weight-decay benefit.