Experiment: Grokking Small Prime Test

Grokking Small Prime Test

Category: Machine Learning

Summary: Testing whether modular-arithmetic grokking appears more reliably when the prime is made small enough to permit very long training runs within the runtime budget.

Grokking often takes extremely long training times, making it difficult to tell whether a negative result means the phenomenon is absent or simply unfinished. This experiment addresses that ambiguity by moving to small modular-addition problems, where the model can be trained for many more effective epochs and the delayed transition should be easier to observe if it is truly present.

The script runs long training trajectories for small primes, records both memorization and grokking epochs when they occur, and compares the resulting gaps across settings. It focuses on whether more extreme epoch counts are enough to produce the characteristic sudden rise in test accuracy.

That makes the project a scaling test of training time rather than a new architecture study. The main question is whether sufficient optimization duration alone can recover grokking in this implementation.

Method: Extended AdamW training on small-prime modular-addition tasks, with explicit detection of memorization and grokking epochs across long runs.

What is measured: Memorization epoch, grokking epoch, grokking gap, actual epochs reached, train and test loss, train and test accuracy, final and peak weight norms, and prime-dependent grokking outcomes.