Experiment: Grokking Dynamics v4

Grokking Dynamics v4

Category: Machine Learning

Summary: Adjusting prime size and learning rate to test whether slower optimization restores a full grokking transition after a faster setup appeared to memorize too sharply.

Earlier grokking runs suggested that too aggressive an optimization setup could drive the model into memorization without leaving room for later reorganization. This experiment reduces the modular-arithmetic problem size and slows the learning rate to test whether that gentler regime allows the delayed generalization transition to complete.

The script trains for a very long horizon, logs train and test metrics at both coarse and fine intervals, and records the separation between the first memorization point and any later grokking point. Its design is explicitly motivated by the need to repair a previous run that appeared to harden memorization too quickly.

That makes the project a targeted intervention on training dynamics. The question is not only whether grokking can happen, but whether slowing early optimization is enough to recover it from a regime where it previously stalled.

Method: Long-horizon modular-addition training with smaller prime and lower learning rate, tracking memorization, delayed generalization, and weight-norm evolution.

What is measured: Memorization epoch, grokking epoch, grokking gap, train and test loss, train and test accuracy, final and peak weight norms, and final grokking detection status.