Category: Machine Learning
Summary: Comparing wide-shallow and narrow-deep architectures under a fixed parameter budget to find where expressiveness best balances trainability.
Model capacity can be distributed across depth or width, but those choices have different consequences for representation power and optimization difficulty. This experiment asks which architecture shape performs best when the total parameter budget is held roughly constant and only the depth-versus-width allocation changes.
The study trains a family of networks ranging from shallow-wide to narrow-deep while keeping total parameters near the same target. By tracking accuracy, loss, and gradient flow over training, it compares the benefits of added depth against the risk of poorer optimization in very deep feedforward models without residual connections.
That is a practical architecture question with theoretical overtones. It probes where extra compositional expressiveness from depth outweighs optimization penalties, and where adding layers mainly makes training harder without paying back enough representational benefit.
Method: Fixed-parameter-budget architecture sweep from shallow-wide to narrow-deep MLPs, with repeated training and gradient-flow diagnostics.
What is measured: Accuracy, loss trajectories, gradient-flow statistics, and best-performing depth-width allocation under a fixed parameter budget.
