Experiment: Batch Size Critical Phenomena

Batch Size Critical Phenomena

Category: Machine Learning

Summary: Mapping whether neural-network generalization changes sharply around a critical batch size that separates noisy SGD behavior from more deterministic training.

Large and small minibatches can produce noticeably different learning behavior, but it is not clear whether that change is gradual or resembles a sharp transition. This experiment asks whether there is a critical batch size below which stochastic gradient noise acts like an implicit regularizer and above which training behaves more deterministically and generalization worsens.

The script sweeps batch size across a wide logarithmic range while holding the overall training setup fixed, then compares test accuracy, generalization gap, loss smoothness, and gradient noise. Multiple seeds provide error bars so the crossover can be treated as a finite-size-style phase diagram rather than a single run.

That framing matters because the practical question is not only which batch size performs best, but whether there is a qualitative regime change in how training works. The experiment targets that transition directly using observables analogous to order parameters and fluctuations.

Method: Repeated MLP training sweeps over minibatch size with fixed epochs and multiple seeds, measuring accuracy, generalization, smoothness, and gradient noise.

What is measured: Test accuracy, generalization gap, loss smoothness, gradient noise, and estimated critical batch-size crossover.