Category: Machine Learning
Summary: Testing whether bottleneck layers rescue compositional generalization by preventing representational collapse and reducing feature redundancy.
Earlier Axiom work found that inserting a bottleneck layer can strongly rescue compositional generalization in wider networks. This experiment asks why, focusing on the hypothesis that the bottleneck works by preventing representational collapse and forcing information through a narrower, more structured channel.
The analysis compares models with and without bottlenecks while varying width and recording effective rank, feature redundancy, representation compression ratio, compositional generalization gap, and gradient-flow diagnostics. Those measurements are designed to show whether the bottleneck changes not just performance but the geometry and information density of the learned representation.
That turns a successful architecture trick into a mechanistic test. If the rescue tracks higher relative rank and lower redundancy, the experiment would link the bottleneck effect directly to representation structure rather than only parameter count.
Method: Width sweeps comparing networks with and without bottleneck layers, paired with representation-rank, redundancy, compression, and gradient-flow diagnostics.
What is measured: Effective rank by layer, feature redundancy, representation compression ratio, compositional generalization gap, and gradient norms across layers.
