Category: Machine Learning
Summary: Measuring how nearly identical neurons break symmetry and differentiate during training as a function of the size of the initial perturbation.
If neurons start from exactly identical weights, gradient descent keeps them identical because they receive the same updates. This experiment asks how small a perturbation is enough to let hidden units differentiate, specialize, and form a richer representation during training.
The script initializes a two-hidden-layer MLP with nearly identical weights plus controlled noise levels from zero up to moderate perturbations. During training it records pairwise neuron similarity, norm variability, effective rank, and the number of effectively distinct neurons, allowing the symmetry-breaking process to be followed over time.
That creates a direct computational test of a basic theoretical point about neural-network training. Instead of assuming symmetry breaking happens automatically, the run measures how its speed and extent depend on the scale of the initial asymmetry.
Method: Train nearly symmetric MLPs across perturbation scales and record neuron similarity, norm spread, effective rank, and distinct-neuron counts through time.
What is measured: Pairwise cosine similarity of neuron weights, standard deviation of neuron norms, effective rank of weight matrices, and number of unique neurons.
