Category: Machine Learning
Summary: Testing whether methods that decorrelate features rescue compositional generalization more reliably than regularizers that mainly control magnitude.
Earlier Axiom findings suggested that nuclear-norm regularization can rescue compositional generalization even when it does not simply preserve higher effective rank. This experiment asks whether the real mechanism is implicit feature decorrelation rather than rank maintenance or generic magnitude control.
The design compares six approaches: no regularization, nuclear-norm penalty, ordinary weight decay, spectral norm clipping, activation L2, and dropout. Across widths 32, 64, and 128 it measures generalization gaps together with effective rank, spectral structure, feature selectivity, cross-group correlation, principal-component alignment, and disentanglement scores.
That makes the run a mechanism test rather than just a regularization bake-off. By contrasting decorrelation-oriented methods with magnitude-only penalties, it aims to identify what kind of representational change actually supports compositional generalization.
Method: Controlled NumPy training sweeps across regularization methods and widths, paired with representation and spectral diagnostics.
What is measured: Compositional generalization gap, effective rank, weight-matrix spectral properties, feature selectivity, cross-feature-group correlation, factor-PC alignment, and DCI disentanglement score.
