Experiment: Power-Law Forgetting v2

Power-Law Forgetting v2

Category: Machine Learning

Summary: Measuring how performance on an earlier task decays during later training when tasks share representations, and how that decay changes with EWC and bottleneck width.

Catastrophic forgetting occurs when a neural network learns a new task and loses performance on an earlier one, especially when both tasks compete for overlapping internal representations. This experiment asks whether that degradation follows a power law in the amount of later training, and how much the decay can be slowed by Elastic Weight Consolidation or increased representational capacity.

The setup uses two tasks with the same input dimensions but different targets, forcing the model to reuse a shared bottleneck representation. After training on Task A, the network is trained on Task B while Task A accuracy is repeatedly measured, and the comparison is repeated for naive SGD versus EWC and for multiple bottleneck widths.

That makes the project a controlled study of interference rather than a benchmark race. It is aimed at the shape of forgetting over time, and at how overlap, capacity, and regularization interact to change that shape.

Method: Sequential two-task neural-network training with shared inputs and bottlenecked representations, comparing SGD and EWC across multiple bottleneck widths.

What is measured: Task A accuracy during Task B training, forgetting-rate exponent, effect of EWC, and dependence on bottleneck width.