Experiment: Neural Scaling Laws

Neural Scaling Laws

Category: Machine Learning

Summary: Measuring how test loss on a structured regression problem scales with neural-network parameter count.

Large-model performance often improves smoothly with size, but the form of that improvement is still an empirical question for any concrete task and training setup. This experiment asks whether a family of multilayer perceptrons learning a structured sinusoidal regression problem follows a clean power-law relation between test loss and parameter count.

The script trains models spanning several orders of magnitude in size and then fits the empirical relation between loss and number of parameters. Rather than focusing on a single best architecture, it treats scale itself as the variable of interest and estimates the exponent in a candidate law of the form L = a N^(-alpha).

That helps connect small controlled models to the broader scaling-law literature. Even when the task is simple, the fitted exponent and its stability say something about whether additional capacity buys predictable gains or runs into diminishing returns quickly.

Method: Regression sweep over MLP sizes with post-fit estimation of a power-law relation between test loss and parameter count.

What is measured: Test loss across model sizes, fitted scaling exponent alpha, prefactor estimates, and goodness of the power-law fit.