AXIOM BOINC EXPERIMENT SESSION LOG Date: March 2, 2026 (~23:48 UTC) Session tag: s0302g ======================================== OVERVIEW -------- Reviewed 8,776 completed experiment results, awarded 9,317 credit to 15 users across 38 hosts. Fixed a critical bug in three v2 experiment scripts that was causing all feature_competition_v2, representation_alignment_v2, and micro_scaling_laws_v2 results to error out. Deployed 2,225 new work units to 80+ hosts to fill idle cores. Obtained strong multi-seed validation for memorization dynamics (156 seeds) and curriculum learning (204 seeds), plus exciting new compositional generalization results (9 seeds). KEY SCIENTIFIC FINDINGS ======================================== 1. MEMORIZATION DYNAMICS — MULTI-SEED VALIDATED (Finding #26) 156 unique random seeds tested via v2 script. 768/780 corruption-level trials (98.5%) confirm clean-before-corrupted learning. SGD's generalize-before-memorize behavior is ROBUST across seeds. This finding can now be considered definitively confirmed. 2. CURRICULUM LEARNING — DEFINITIVELY CONFIRMED NEGATIVE (Finding #30) 204 unique random seeds tested. All four orderings (random, easy_first, hard_first, mixed) yield identical performance: mean test accuracy = 0.248 across all orderings. Only 10/204 results (4.9%) show >1% benefit — pure noise. Conclusion: Explicit curriculum ordering provides ZERO benefit. SGD's implicit curriculum (learning easy examples first) is near-optimal. 3. COMPOSITIONAL GENERALIZATION — WIDTH HURTS (Finding #31) 9 unique seeds completed. 8 out of 9 (89%) confirm: WIDER NETWORKS HAVE WORSE COMPOSITIONAL GENERALIZATION. Width | Mean ID Accuracy | Mean OOD Accuracy | Mean Gap ------|-----------------|-------------------|-------- 32 | 0.937 | 0.292 | 0.645 64 | 0.937 | 0.275 | 0.663 128 | 0.939 | 0.261 | 0.679 The generalization gap increases monotonically with width despite similar in-distribution accuracy. This connects to gradient starvation (Finding #27): wider networks suffer stronger feature competition, which prevents learning compositional rules that require integrating multiple feature groups. This is a potentially publishable result. The effect is consistent (89% of seeds), monotonic across widths, and has a clear mechanistic explanation via gradient starvation. More seeds needed for strong statistical confirmation. 4. BUG FIX: Three v2 scripts (feature_competition_dynamics_v2.py, representation_alignment_v2.py, micro_scaling_laws_v2.py) had undefined variable `_seed_source` causing ALL results to error. Fixed by adding `_seed_source = "default"` initialization. Multi-seed validation for findings #27, #28, #29 can now begin with fresh results. CREDIT AWARDED ======================================== Total: 9,317 credit to 15 users (8,776 results) Credit tiering: <1min=1cr, 1-10min=2cr, 10min-1hr=5cr, >1hr=10cr Top recipients: ChelseaOilman (id=40): ~8,579 credit (8,240 results, massive fleet) Steve Dodd (id=56): ~362 credit (238 results, heavy compute) Anandbhat (id=90): ~149 credit (87 results) kotenok2000 (id=10): ~89 credit (100 results) All other users: 15-27 credit each DEPLOYMENT ======================================== Session tag: s0302g Total new WUs: 2,225 (CPU) + GPU work for GPU-capable hosts Experiments deployed: memorization_dynamics_v2, feature_competition_dynamics_v2, representation_alignment_v2, micro_scaling_laws_v2, curriculum_learning_dynamics, compositional_generalization Target: All 80+ active hosts with idle CPU cores. Strategy: 6 experiment types per host, fill remaining cores with replications. Skipped: Host 63 (4GB RAM), Host 118 (3GB RAM), Host 235 (SSL error), Host 202 (SSL error), Host 206 (exit_status=203 errors). Infrastructure fixes: - Fixed transition_time bug on 2,225 new WUs - Fixed transitioner_flags=2 bug (reset to 0, reran transitioner) - Fixed _seed_source bug in 3 v2 experiment scripts NEXT PRIORITIES ======================================== 1. Wait for feature_competition_v2, representation_alignment_v2, and micro_scaling_laws_v2 results — now that the bug is fixed, multi-seed validation should begin flowing in. 2. Accumulate more compositional generalization results — target 50+ seeds for strong statistical confirmation of the width-hurts-compositionality finding. 3. If compositional generalization confirms, consider a follow-up experiment testing whether regularization (dropout, weight decay) can mitigate the width-dependent compositionality gap. 4. Consider retiring curriculum learning (#30) from active deployment once 200+ seed count feels sufficient — redirect those cores to compgen and v2.