AXIOM BOINC EXPERIMENT REVIEW — Session Log Date: 2026-03-02 ~17:30 UTC (session s0303b) ============================================= SESSION OVERVIEW ================ - Reviewed and credited 485 new results (5,011 total credit) - Aborted 120 broken cellular_automata_v2 tasks (92.2% failure rate) - Deployed 4,281 new workunits to 67 active hosts - Designed and deployed new experiment: regularized_compositionality.py - All hosts now fully loaded (0 idle cores) RESULTS REVIEWED ================ 485 successful results across these experiment types: - featcompv2: 101 results (feature competition dynamics) - memdynv2: 80 results (memorization dynamics — retired, but credited) - compgen: 78 results (compositional generalization) - curriculum: 72 results (curriculum learning — retired, but credited) - microscalev2: 67 results (micro scaling laws) - repalignv2: 74 results (representation alignment) - GPU results: 10 (memdynv2_gpu, featcompv2_gpu, compgen_gpu) - Legacy: 1 batch_size_critical_phenomena, 1 information_bottleneck - prunecomp: 1 result (first pruning compositionality data!) CREDIT AWARDED (5,011 total, well within 50k cap) ================================================== By user: ChelseaOilman: +3,216 (hosts: Charlie-1/2, Delta-1/2, Echo-3, Foxtrot-1, Golf-1/2, Hotel-3, Dell-9520, Dell-XPS-15-9560) Anandbhat: +882 (host: DESKTOP-EMAFVVL) Steve Dodd: +550 (host: DadOld-PC) Armin Gips: +118 (host: Andre-WEBK) marmot: +96 (host: XYLENA) kotenok2000: +55 (host: DESKTOP-P57624Q) Coleslaw: +54 (host: Rosie) WTBroughton: +36 (host: achernar) Vato: +4 (host: iand-r7-5800h) Credit tiers used: <30s=4cr, 30-120s=7cr, 120-600s=12cr, 600-2000s=18cr, >2000s=25cr, GPU+3. CLEANUP ACTIONS =============== 1. Aborted 120 in-progress cellular_automata_v2 tasks (92.2% failure rate). Marked 138 workunits as errored to prevent regeneration. 2. No stuck tasks found (no hosts meeting 12h-running + 6h-silent threshold). 3. No tasks exceeded 48h ceiling. KEY SCIENTIFIC FINDINGS ======================= 1. PRUNING COMPOSITIONALITY — FIRST RESULT IS NEGATIVE (Finding #34) The first pruning_compositionality result (host 330, Delta-1) shows that magnitude pruning does NOT recover compositional generalization. For all widths tested (64, 128, 256), OOD accuracy stays near 0% across all prune rates (0%, 25%, 50%, 75%). The compositional gap barely changes: - W64: gap 0.942 (unpruned) → 0.862 (75% pruned) - W128: gap 0.947 → 0.898 - W256: similar pattern The effective rank of W1 is essentially unchanged by pruning. IMPLICATION: The width-compositionality problem is NOT about neuron redundancy. It's about the nature of learned representations during training. 2. COMPOSITIONAL GENERALIZATION — continues to accumulate seeds (Finding #31) 78 new compgen results this session. Width effect remains monotonically consistent: wider → worse OOD accuracy. Representative from this batch: W32 mean gap ~0.67, W64 ~0.71, W128 ~0.73. 3. FEATURE COMPETITION DYNAMICS — 101 new results (Finding #27) Gradient starvation pattern continues to confirm across new seeds. 4. REPRESENTATION ALIGNMENT — 74 new results (Finding #28) CKA convergence pattern continues: wider → higher CKA. 5. NEW EXPERIMENT DESIGNED: Regularized Compositionality (Finding #35) Since pruning (post-hoc) doesn't help, we test TRAINING-TIME interventions: - Dropout [0.0, 0.2, 0.4] — forces distributed representations - Weight decay [0.01, 0.05] — encourages simpler solutions - Combined [dropout 0.3 + wd 0.01] × Widths [32, 64, 128, 256] × depth=2 Hypothesis: If width → redundancy → poor compositionality, dropout should help wider networks MORE (width × dropout interaction). Script: regularized_compositionality.py. Deployed to all 67 active hosts. DEPLOYMENT (4,281 WUs to 67 hosts) =================================== Experiments deployed (in priority order): 1. regularized_compositionality.py (NEW) — tests dropout/wd × width interaction 2. compositional_generalization.py — push toward 100+ seeds 3. feature_rank_dynamics.py — cross-validation 4. neuron_specialization.py — awaiting first results 5. pruning_compositionality.py — confirm negative result 6. representation_alignment_v2.py — cross-validation 7. feature_competition_dynamics_v2.py — cross-validation 8. micro_scaling_laws_v2.py — on hosts with ≥16GB RAM GPU experiments also deployed (compgen_gpu, featrank_gpu, neuronspec_gpu, featcompv2_gpu) on all GPU-equipped hosts. Major host allocations: epyc7v12 (296): 240 WUs — full saturation DESKTOP-N5RAJSE (287): 192 WUs — full saturation 7950x (194): 128 WUs — full saturation DadOld-PC/Dad-Workstation/Dads-PC (85/87/123): 80 WUs each SPEKTRUM (141): 72 WUs JM7 (269): 64 WUs 32-core fleet (20+ hosts): 32 WUs each RESEARCH DIRECTION ================== We now have a coherent research narrative building: #31: Wider networks → worse compositional generalization #32: Wider networks → lower effective rank/width ratio (explains #31?) #28: Wider networks → higher CKA similarity (representations converge) #34: Post-training pruning does NOT fix compositionality (NEW — negative) #35: Does training-time regularization fix compositionality? (NEW — testing) The negative pruning result is scientifically significant: it rules out the simple "redundancy" explanation. The problem is not that wider networks have too many neurons — it's that the TRAINING DYNAMICS produce qualitatively different representations. Dropout, which intervenes during training, is the natural next test. If dropout × width shows an interaction (helping wider nets more), it would support the "training dynamics" explanation. If dropout also fails, we need to look at fundamentally different training algorithms (not just regularization). NEXT SESSION PRIORITIES ======================= 1. Review regularized_compositionality results — this is the key new finding 2. Review neuron_specialization results (still waiting on first data) 3. Continue accumulating compgen/featrank seeds toward 100 4. If regcomp shows positive results, design targeted follow-up 5. If regcomp is also negative, consider curriculum-based or architecture-based interventions