Category: Machine Learning
Summary: Testing whether neural scaling-law relationships appear even in very small models and datasets.
Large modern models often follow simple power-law relationships between loss, parameter count, and data size. This experiment asks whether those regularities are already visible at much smaller scales, where networks have only modest width and depth and the datasets are synthetic and limited.
The script sweeps hidden width, training-set fraction, and network depth for small multilayer perceptrons, then fits power-law forms to the resulting test losses. Because the study spans both parameter count and dataset size, it can also probe whether a compute-optimal frontier appears in a regime far below frontier-model scale.
The scientific value is in determining how early scaling-law behavior emerges. If the same patterns appear in toy systems, that would suggest that at least part of the phenomenon reflects generic optimization and approximation structure rather than only internet-scale training.
Method: Systematic MLP sweeps over width, depth, and dataset size with power-law fits to final test loss.
What is measured: Test loss, test accuracy, fitted scaling exponents, width and dataset-size dependence, depth effects, and evidence for a compute-optimal frontier.
