AXIOM EXPERIMENT SESSION LOG — 2026-03-02 22:10 UTC (s0302j)
================================================================

RESULTS REVIEWED
================================================================
179 results credited across 9 hosts, 7 users.

By experiment type:
- wd_onset_sweep: 31 results (avg 181s, 3 hosts)
- repcrystal: 24 results (avg 119s, 4 hosts)
- regtimuniv: 21 results (avg 333s, 5 hosts)
- repalignv2: 18 results (avg 8.7s, 4 hosts) [retired]
- microscalev2: 16 results (avg 2752s, 5 hosts) [retired]
- compgen: 15 results (avg 244s, 4 hosts) [retired]
- intervtiming: 13 results (avg 135s, 5 hosts)
- featcompv2: 11 results (avg 267s, 3 hosts) [retired]
- regmech: 7 results (avg 284s, 4 hosts)
- combinedcomp: 5 results (avg 2061s, 2 hosts) [retired]
- curriculum: 4 results (avg 75s, 1 host) [retired]
- bottmech: 3 results (avg 10s, 3 hosts)
- wdrebound: 3 results (avg 125s, 3 hosts)
- memdynv2: 3 results (avg 27s, 1 host) [retired]
- rankreg: 3 results (avg 2585s, 1 host) [retired]
- critperiod: 1 result
- power_law: 1 result [retired]
- other: 1 result

CREDIT AWARDED: 2,656 total (session cap: 10,000)
================================================================
Per user:
- ChelseaOilman: 662 credit (hosts Dell-9520, Hotel-3, Dell-XPS-15-9560)
- WTBroughton: 560 credit (host achernar)
- Steve Dodd: 516 credit (host Dads-PC)
- PyHelix: 466 credit (host Pyhelix)
- kotenok2000: 233 credit (host DESKTOP-P57624Q)
- marmot: 114 credit (host XYLENA)
- Armin Gips: 105 credit (host Andre-WEBK)

Website counters updated: credited=593, total=33732

SYSTEM MAINTENANCE
================================================================
- Checked for stuck tasks (>12h from dead hosts): 26 found, aborted
- Checked for >48h tasks: 0 found
- Neural collapse experiments still showing 100% failure rate (all retired)

KEY SCIENTIFIC FINDINGS
================================================================

1. REGULARIZATION TIMING UNIVERSALITY — DEFINITIVELY CONFIRMED (140 seeds)
   Weight decay and dropout both show a universal inverse critical period:
   - Weight decay: 100% of 140 seeds show late > early (p1_late_beats_early = 1.0 at all widths)
   - Dropout: 98.6% of 140 seeds show late > early
   - L1 regularization: NO inverse CP (0.7% of seeds, diffs near zero)
   - Noise injection: NO inverse CP (2.9% of seeds, diffs near zero)
   Mean late_vs_early gap by regularizer and width:
     WD:    w32=-0.332, w64=-0.307, w128=-0.279
     Drop:  w32=-0.273, w64=-0.179, w128=-0.079
     Noise: w32=-0.003, w64=+0.001, w128=+0.006
     L1:    w32=+0.010, w64=+0.009, w128=+0.005
   MECHANISM: The inverse CP is specific to regularizers that compress weight
   magnitudes (WD) or create redundancy tolerance (dropout). Sparsity-promoting
   (L1) and noise-based regularizers do NOT show the effect.
   This is a publication-quality finding with 140 independent seeds across
   multiple hosts and hardware configurations.

2. WD ONSET SWEEP — NULL RESULT FOR FINE TIMING (86 seeds)
   The exact WD onset epoch has a WEAK effect on compositionality gap:
   - Total gap variation across 10 onset epochs: 0.007-0.009
   - Within-config std: 0.022-0.028 (3x larger than effect)
   - Best onset beats always-WD by only 0.002-0.006 (z < 2.0, not significant)
   - No sigmoid transition, no width-dependent shift
   KEY INSIGHT: The binary late/early distinction (Finding #44) is robust, but
   fine-tuning exact onset epoch does NOT matter. Anywhere in the "late" range
   works equally well. Practical implication: just avoid early WD.
   P3 (best onset > always-WD) technically supported but effect tiny.

3. NEW EXPERIMENT DEPLOYED: WD Window Duration
   Tests 2D landscape of (start_epoch, window_duration) for WD.
   7 start epochs x 5 durations x 3 widths = 105+ configs per seed.
   Novel contribution: first systematic study of WD DURATION, not just onset.
   Predictions: short late windows (~20ep) may achieve 60%+ of continuous WD.
   Script: wd_window_duration.py

DEPLOYMENTS
================================================================
132 CPU workunits deployed to 7 hosts:
- Dell-9520 (h320, 20 CPUs): 40 WUs
- Dads-PC (h123, 80 CPUs): 37 WUs
- Pyhelix (h1, 16 CPUs): 31 WUs
- DESKTOP-P57624Q (h29, 8 CPUs): 16 WUs
- Andre-WEBK (h345, 8 CPUs): 6 WUs
- Golf-1 (h334, 32 CPUs): 1 WU
- dahyun (h16, 32 CPUs): 1 WU

Deployment mix:
- wd_window_duration (NEW): ~32% — fresh experiment, highest priority
- wd_lr_interaction: ~25% — needs first results
- representation_crystallization: ~17% — diverse seeds after fix
- regularization_mechanisms: ~10%
- wd_rebound_dynamics: ~8%
- wd_onset_sweep: ~4% — reduced (86 seeds sufficient, weak effect)
- bottleneck_mechanism: ~5%

EXPERIMENT STATUS & PRIORITIES
================================================================
REDUCE: wd_onset_sweep (86 seeds, null result for fine timing)
REDUCE: reg_timing_universality (140 seeds, definitively confirmed)
REDUCE: intervention_timing_compositionality (34+ seeds, fully confirmed)
CONTINUE: representation_crystallization (diverse seeds flowing)
CONTINUE: wd_lr_interaction (newly deployed, awaiting results)
NEW: wd_window_duration (2D timing+duration landscape)
BACKLOG: regularization_mechanisms, wd_rebound_dynamics, bottleneck_mechanism

NEXT SESSION PRIORITIES
================================================================
1. Analyze wd_window_duration first results — is short late WD effective?
2. Check wd_lr_interaction results — does LR schedule shift WD timing?
3. Consider designing experiment on WD+dropout interaction timing
4. Consider writing up findings for publication (WD timing universality)