Experiment: Micro Scaling Laws

Micro Scaling Laws

Category: Machine Learning

Summary: Testing whether power-law scaling between loss, model size, and dataset size already appears in small neural networks.

Neural scaling laws are usually discussed for very large models, but it is unclear how much of that behavior requires extreme scale. This experiment asks whether tiny multilayer perceptrons trained on simple classification tasks already show the same kind of log-linear structure in loss versus parameters and data.

The script sweeps width, depth, and dataset fraction, then fits a power-law form to the final test losses. Because the same setup varies both capacity and data, it can also look for a small-scale analog of the compute-optimal frontier that appears in large-model studies.

The resulting map helps separate universal regularities from scale-specific ones. If the same patterns appear in miniature systems, that would support the view that scaling laws have deep roots in optimization and approximation rather than only in industrial-scale training.

Method: MLP sweeps over width, depth, and dataset size with power-law fitting of final test-loss curves.

What is measured: Test loss, test accuracy, scaling exponents, parameter-count effects, dataset-size effects, depth dependence, and evidence for small-scale compute-optimal behavior.