Experiment: Bottleneck Architectures for Compositional Generalization

Bottleneck Architectures for Compositional Generalization

Category: Machine Learning

Summary: Testing whether inserting a narrow bottleneck rescues out-of-distribution compositional generalization in wide neural networks.

Earlier Axiom experiments suggested that simply making a network wider can hurt compositional generalization, partly because wide hidden layers develop redundant low-rank representations. This experiment asks whether a hard architectural bottleneck can counter that effect by forcing information through a much smaller intermediate representation.

The model trains feedforward networks on a compositional classification task while varying both overall width and bottleneck width. By comparing in-distribution accuracy, out-of-distribution accuracy, the generalization gap, and representation rank across layers, the experiment tests whether compression during training works better than post-hoc pruning or soft regularization.

The scientific interest is mechanistic rather than purely performance-based. If a bottleneck restores generalization, that would support the claim that representational redundancy, not just parameter count, is a key reason wider models fail on this task.

Method: Numpy-based training of feedforward networks with controlled widths and bottleneck sizes on a compositional generalization task, with representation-rank diagnostics across layers.

What is measured: In-distribution accuracy, out-of-distribution accuracy, generalization gap, bottleneck-width effects, and effective rank of hidden representations.