Experiment: Simplicity Bias

Simplicity Bias

Category: Machine Learning

Summary: Measuring when neural networks prefer simple predictive features over more complex ones when both fit the training data equally well.

Neural networks are often said to exhibit simplicity bias, but it is not always clear when that preference dominates over capacity and optimization effects. This experiment constructs a synthetic task with two equally predictive channels, one simple and one more complex, and then forces them to disagree on a special test set.

By varying width, depth, learning rate, and training duration, the experiment tracks when the network relies more on the simple rule and when that preference weakens. Because both channels fit the training set perfectly, the disagreement test provides a direct readout of which internal solution the network has chosen.

That design turns a qualitative claim about inductive bias into a measurable competition between features. The result is intended to map where simplicity bias is strongest and where richer representations begin to override it.

Method: Synthetic 2D classification sweeps over width, depth, learning rate, and training time, using a disagreement test set to identify the preferred feature channel.

What is measured: Preference for simple versus complex features, train and test accuracy, width and depth dependence, and conditions where simplicity bias strengthens or breaks down.