Experiment: Combined Bottleneck and Orthogonality for Compositional Generalization

Combined Bottleneck and Orthogonality for Compositional Generalization

Category: Machine Learning

Summary: Testing whether a hard bottleneck and orthogonality regularization work better together than either intervention alone for compositional generalization.

Two different ideas may help neural networks generalize compositionally: forcing information through a narrow internal bottleneck, and regularizing hidden weights so their directions stay spread out rather than collapsing. This experiment asks whether those mechanisms are complementary, producing a stronger effect together than either one does by itself.

The architecture expands to a wide hidden layer, compresses into a bottleneck, and then expands again before classification. Across several widths, bottleneck sizes, and orthogonality settings, the experiment compares four regimes: no intervention, bottleneck only, orthogonality only, and both combined. It tracks both standard accuracy metrics and effective-rank diagnostics at each hidden layer.

The scientific interest is in mechanism rather than benchmark chasing. If the combined intervention wins clearly, it would support the idea that compositional failures come from both excess representational freedom and poor use of the dimensions that remain active.

Method: Width-and-bottleneck MLP sweep with an orthogonality penalty applied to hidden layers, evaluated on a held-out compositional classification task.

What is measured: In-distribution accuracy, out-of-distribution accuracy, compositional gap, effective rank by layer, and interaction effects between bottlenecks and orthogonality.