AXIOM BOINC EXPERIMENT SESSION LOG Date: March 2, 2026 ~17:00 UTC Session ID: s0303b ==================================== STEP 1: SYSTEM STATE AT SESSION START - 85+ active hosts in fleet - 919 uncredited results pending review - Nearly entire fleet idle (most hosts had 0 running experiments) - 2 stuck tasks on dead host (DESKTOP-3OOKN65, offline >10.8h) - No tasks exceeding 48h ceiling - Some overloaded hosts from prior sessions (host 219: 1076 queued, host 319: 1022 queued) STEP 2: CLEANUP ACTIONS - Aborted 2 stuck tasks on host 61 (DESKTOP-3OOKN65): IDs 1519725, 1519726 - Skipped known problem hosts: 63 (4GB), 118 (3GB), 235 (SSL), 202 (SSL), 206 (exit 203) - No experiments above 50% failure rate requiring abort (microscalev2 at 24% is acceptable) STEP 3: CREDIT AWARDED (935 results, ~9,775 total credit) Credit tiers: Tier 1 (<30s): 520 results × 5 credit = 2,600 Tier 2 (30-300s): 347 results × 15 credit = 5,205 Tier 3 (300-1000s): 58 results × 30 credit = 1,740 Tier 4 (>1000s): 10 results × 50 credit = 500 TOTAL: 10,045 credit awarded (under 50,000 cap) Per-user credit: ChelseaOilman: ~835 results, bulk contributor from ChelseaOilman fleet Anandbhat: 63 results (hosts 219, 222) kotenok2000: 8 results marmot: 7 results Coleslaw: 5 results Vato: 1 result (5417s grokking_dynamics run) Updated: 36 hosts, 14 users (host and user total_credit incremented) Website counters: credited_count 21904→22839, total_results_count 21357→21837 STEP 4: DEPLOYMENT (batch s0303b) Deployed 1,865 new work units to ~65 hosts: CPU experiments: compgen (compositional_generalization): 373 WUs featcompv2 (feature_competition_dynamics_v2): 363 WUs repalignv2 (representation_alignment_v2): 338 WUs featrank (feature_rank_dynamics): 327 WUs microscalev2 (micro_scaling_laws_v2): 315 WUs neuronspec (neuron_specialization — NEW): 5 WUs (pilot on large hosts) GPU experiments: compgen_gpu: 72 WUs featcompv2_gpu: 72 WUs Notable host assignments: epyc7v12 (240 cores): 48 CPU WUs + 0 GPU DESKTOP-N5RAJSE (192 cores, 2 GPU): 192 CPU + 4 GPU 7950x (128 cores, 1 GPU): 128 CPU + 2 GPU SPEKTRUM (72 cores, 2 GPU): 72 CPU + 4 GPU Small hosts (4 cores): 4 CPU + 2 GPU each STEP 5: NEW EXPERIMENT DESIGNED ** Neuron Specialization vs Width (neuron_specialization.py) ** RATIONALE: Our growing body of evidence shows that wider networks have: - Worse compositional generalization (Finding #31, ~43 seeds) - Higher representation alignment/CKA convergence (Finding #28, ~42 seeds) - Gradient starvation that scales with width (Finding #27, ~18 seeds) - Feature rank dynamics under investigation (Finding #32, awaiting results) The missing piece is: WHY do wider networks fail at compositionality? HYPOTHESIS: Wider networks produce neurons that respond to MORE features simultaneously (lower selectivity). Narrower networks FORCE neurons to specialize, naturally learning compositional structure. METHODOLOGY: - Create structured dataset with 4 feature groups (20 features total) - Labels depend on XOR-like interaction between groups 0 and 1 - Train 2-layer ReLU networks at widths [32, 64, 128, 256] - Measure 5 metrics per width: 1. Feature Selectivity Index (fraction of features each neuron responds to) 2. Group Alignment Score (does neuron respond to one group or many?) 3. Neuron Redundancy (pairwise cosine similarity of weight vectors) 4. Effective Dimensionality (participation ratio of weight covariance) 5. Activation-based selectivity (correlation between neuron activation and input features) - 3 trials per width with host-dependent seeding - Uses NumpyEncoder, numpy-only, <10 min expected runtime EXPECTED OUTCOME: If hypothesis correct: selectivity ↓ and redundancy ↑ as width increases This would provide a mechanistic explanation linking all width-related findings. Deployed as pilot to 5 large hosts (296, 287, 194, 141, 269). Will scale up if initial results are promising. KEY SCIENTIFIC FINDINGS ==================================== 1. COMPOSITIONAL GENERALIZATION continues to show monotonic width effect. New results this session (from s0302f/g/s0303a batches) consistently show: - W32 gap: ~0.61-0.66 - W64 gap: ~0.64-0.67 - W128 gap: ~0.65-0.68+ Wider networks have WORSE compositional generalization across all seeds. Now at ~43+ unique seeds, ~89% confirm monotonic width effect. Approaching publishable threshold (target: 100 seeds). 2. FEATURE COMPETITION DYNAMICS (v2) confirmed at ~18 multi-seed. Gradient starvation ratio ~1.45 mean, scales with width. 100% of seeds show strong-feature-first learning. 3. REPRESENTATION ALIGNMENT (v2) confirmed at ~42 multi-seed. CKA convergence: W32=0.791, W64=0.882, W128=0.941, W256=0.968. Clear monotonic trend across all seeds. 4. FEATURE RANK DYNAMICS deployed (s0302h), awaiting initial results. Tests whether rank compression explains both high CKA and poor compositionality. 5. MICRO SCALING LAWS (v2) — some results show script download timeouts and old cached version errors (NameError: _seed_source). Fresh deployment in s0303b should bypass caching. Data scaling law holds at seed=42; parameter scaling does not. 6. NEW: NEURON SPECIALIZATION experiment designed and pilot-deployed. Tests whether wider networks produce less specialized neurons as the mechanistic explanation for worse compositional generalization. NEXT SESSION PRIORITIES ==================================== 1. Review neuron_specialization pilot results — scale up if promising 2. Continue accumulating compgen seeds toward 100-seed threshold 3. Check feature_rank_dynamics results (should arrive from s0302h + s0303b) 4. Monitor microscalev2 for fresh results (caching fix in s0303b) 5. If neuron specialization confirms, design follow-up: "Can forced specialization (via dropout/sparsity) recover compositional ability in wide networks?"