Category: Machine Learning
Summary: Testing whether inserting a narrow bottleneck rescues out-of-distribution compositional generalization in wide neural networks.
Earlier Axiom experiments suggested that simply making a network wider can hurt compositional generalization, partly because wide hidden layers develop redundant low-rank representations. This experiment asks whether a hard architectural bottleneck can counter that effect by forcing information through a much smaller intermediate representation.
The model trains feedforward networks on a compositional classification task while varying both overall width and bottleneck width. By comparing in-distribution accuracy, out-of-distribution accuracy, the generalization gap, and representation rank across layers, the experiment tests whether compression during training works better than post-hoc pruning or soft regularization.
The scientific interest is mechanistic rather than purely performance-based. If a bottleneck restores generalization, that would support the claim that representational redundancy, not just parameter count, is a key reason wider models fail on this task.
Method: Numpy-based training of feedforward networks with controlled widths and bottleneck sizes on a compositional generalization task, with representation-rank diagnostics across layers.
What is measured: In-distribution accuracy, out-of-distribution accuracy, generalization gap, bottleneck-width effects, and effective rank of hidden representations.
