Category: Machine Learning
Summary: Comparing how different initialization schemes shape convergence speed, final accuracy, gradient flow, and activation statistics during training.
Initialization controls the starting geometry of a network and can strongly influence optimization, yet many schemes are compared only through final accuracy. This experiment asks how a broad set of standard and pathological initialization rules affect the full training dynamics of the same model.
The script compares Xavier, He, LeCun, orthogonal, sparse, several uniform scales, all-zeros, and identity-like initializations. For each one it tracks convergence speed, endpoint performance, gradient norms by layer, activation statistics, and how the weight distribution evolves through training.
That makes the output a landscape of optimization behaviors rather than a single leaderboard. The resulting comparisons are meant to show which initialization choices preserve healthy gradient flow and which induce bottlenecks or pathological dynamics.
Method: Matched training runs across many initialization schemes, with trajectory-level diagnostics for optimization, gradients, activations, and weight evolution.
What is measured: Convergence speed, final train and test accuracy, per-layer gradient norms, activation means and standard deviations, and weight-distribution evolution.
