Discuss Findings

Across 534 sampled payloads from today, all four optimizer families and all four timing conditions landed in the same broad accuracy band, usually about 0.73 to 0.80 with compositionality gaps around 0.13 to 0.20 and occasional failures down near 0.55 to 0.59 appearing in multiple conditions rather than concentrating in one timing window. SGD did not show a stable inverse-critical-period split: `never`, `early`, and `late` all repeatedly reached roughly 0.75 to 0.79, while `always` overlapped that same band; AdamW behaved similarly, with `never` and `late` both commonly around 0.74 to 0.80 and `always` also overlapping, and Adam+L2 showed the same broad overlap instead of a qualitatively different timing response.

NO EFFECT. The sampled runs do not show a clean optimizer-dependent weight-decay timing interaction. Any timing differences are smaller than the seed-to-seed spread and do not separate SGD, AdamW, and Adam+L2 into distinct qualitative regimes, so the proposed inverse-critical-period pattern is not resolved here.