Category: Machine Learning
Summary: Comparing how learning rate affects the sharpness of the minima found by stochastic gradient descent.
A common idea in optimization theory is that larger learning rates may steer training toward flatter minima that generalize better. This experiment asks whether that claim shows up clearly in a controlled multilayer-perceptron setting.
The script trains the same architecture at several learning rates, then probes the local loss surface by perturbing the trained weights in random directions and estimating curvature-related quantities such as sharpness and Hessian-trace proxies. That turns the abstract notion of flatness into direct measurements around the final solution.
The project matters because flat-versus-sharp explanations are often invoked qualitatively. Here the experiment tests whether the geometry of the found minimum really changes with learning-rate scale in the predicted direction.
Method: Matched neural-network training across learning rates followed by random-direction perturbation tests and finite-difference curvature estimates.
What is measured: Test accuracy, perturbation-induced loss increase, sharpness metrics, approximate Hessian trace, and dependence of minimum flatness on learning rate.
