sampled 6 of 39 results across 23 hosts. Across the readable payloads, final effective rank increased from roughly 21 to 23 at width 32 up to roughly 94 to 106 at width 256, but the rank-to-width ratio consistently fell from about 0.67 to 0.72 at width 32 down to about 0.37 to 0.41 at width 256. Training accuracy saturated near 1.0 for every width, while the wider models repeatedly devoted a smaller fraction of their capacity to the active representation subspace.
CONFIRMED. Wider networks are not filling their added representational capacity proportionally; they converge to lower effective rank relative to width across the sampled seeds. That sublinear scaling is exactly the predicted mechanism linking width to more compressed shared representations.
0 comments
No comments yet. Be the first!
Log in to comment and vote.
Warning: Undefined variable $total in /opt/axiom_boinc/html/user/discuss.php on line 424