Experiment: Grokking Dynamics v7

Grokking Dynamics v7

Category: Machine Learning

Summary: Revisiting grokking with corrected weight decay to see whether the delayed phase transition reappears after earlier settings produced memorization without generalization.

Several earlier grokking runs in this sequence memorized modular arithmetic but failed to show the later jump in test accuracy, raising the possibility that the regularization settings were simply wrong for the phenomenon. This version asks whether correcting the weight-decay setup restores the expected delayed transition.

The script trains the same style of modular-addition model for a long horizon while recording train and test metrics, weight norms, and the epochs where memorization and grokking occur if they occur at all. It is explicitly designed as a corrective follow-up to earlier failed variants rather than a fresh benchmark.

That makes the experiment a diagnosis of sensitivity in the grokking phenomenon. The issue is whether the delayed-generalization phase is robust once the weight-decay mechanism is specified properly, or whether it disappears under small optimization changes.

Method: Long-horizon modular-addition training with corrected weight-decay settings and explicit detection of memorization and grokking epochs.

What is measured: Memorization epoch, grokking epoch, grokking gap, train and test loss, train and test accuracy, final and peak weight norms, and grokking-detected status.