Experiment: Power-Law Forgetting in Neural Networks

Power-Law Forgetting in Neural Networks

Category: Machine Learning

Summary: Measuring whether forgetting after sequential task training follows a power-law decay and how Elastic Weight Consolidation changes that curve.

Neural networks trained on tasks in sequence often forget earlier tasks when learning new ones. This experiment asks whether that loss of retained performance follows a simple power-law form over time and whether Elastic Weight Consolidation, a standard continual-learning technique, changes the shape of the forgetting curve rather than only its final accuracy.

The script trains a small two-layer multilayer perceptron on two tasks in sequence using NumPy backpropagation, then tracks accuracy decay and fits a function of the form a times t to the minus b plus c. By comparing runs with and without Elastic Weight Consolidation, it measures both the severity and the time profile of catastrophic forgetting.

That is useful because continual-learning methods are often compared by endpoint performance alone. This experiment instead treats forgetting as a dynamical law and asks whether there are regular scaling patterns in how memory degrades after task switching.

Method: Sequential two-task training of a two-layer MLP with NumPy backpropagation, followed by power-law fitting of forgetting curves with and without Elastic Weight Consolidation.

What is measured: Forgetting-curve exponent, retained accuracy over time, baseline versus EWC comparison, and fit quality.