Category: Machine Learning
Summary: Testing whether wider networks develop less selective, more redundant neurons, helping explain why width can hurt compositional generalization.
Earlier Axiom results suggested that widening a network can worsen compositional generalization. This experiment asks whether the reason is reduced neuron specialization: instead of learning focused feature detectors, wider networks may spread responses across more features and produce more redundant hidden units.
The script measures several representation diagnostics, including feature selectivity, group alignment, neuron redundancy, and effective dimensionality. Those observables are designed to distinguish specialized compositional features from broad, overlapping responses that may be easier to fit in wide models.
That turns the problem from a pure accuracy comparison into a mechanistic study of internal representations. If specialization drops with width, the result would connect architectural scale directly to the geometry of learned features.
Method: Train networks of different widths and compare hidden-unit selectivity, group alignment, redundancy, and participation-ratio-style dimensionality.
What is measured: Feature Selectivity Index, Group Alignment Score, neuron redundancy, effective dimensionality, and width dependence of those representation measures.
