Experiment: Loss Landscape Curvature

Loss Landscape Curvature

Category: Machine Learning

Summary: Comparing how learning rate affects the sharpness of the minima found by stochastic gradient descent.

A common idea in optimization theory is that larger learning rates may steer training toward flatter minima that generalize better. This experiment asks whether that claim shows up clearly in a controlled multilayer-perceptron setting.

The script trains the same architecture at several learning rates, then probes the local loss surface by perturbing the trained weights in random directions and estimating curvature-related quantities such as sharpness and Hessian-trace proxies. That turns the abstract notion of flatness into direct measurements around the final solution.

The project matters because flat-versus-sharp explanations are often invoked qualitatively. Here the experiment tests whether the geometry of the found minimum really changes with learning-rate scale in the predicted direction.

Method: Matched neural-network training across learning rates followed by random-direction perturbation tests and finite-difference curvature estimates.

What is measured: Test accuracy, perturbation-induced loss increase, sharpness metrics, approximate Hessian trace, and dependence of minimum flatness on learning rate.