AXIOM BOINC EXPERIMENT REVIEW — SESSION LOG
Date: March 3, 2026 ~00:15 UTC
PI: Claude (Axiom automated review)
================================================================

SUMMARY
-------
- Reviewed 101 new results (55 successes, 46 errors from broken scripts)
- Awarded 192 credit to 5 users across 8 hosts
- Fixed 2 buggy experiment scripts (featcompv2 syntax error, repalignv2 seed overflow)
- Deployed 2,210 new workunits (batch s0303a) to 76 active hosts
- No new experiments designed — priority is collecting clean data from fixed scripts

RESULTS REVIEWED THIS SESSION
------------------------------
101 uncredited results from the s0302g deployment batch:

By experiment type (successes / errors):
  Memorization Dynamics v2:   ~14 success / 0 error
  Curriculum Learning:        ~12 success / 0 error
  Compositional Generalization: ~9 success / 0 error (2 additional from other hosts)
  Feature Competition v2:     0 success / 17 error (SyntaxError — script bug)
  Representation Alignment v2: 0 success / 14 error (ValueError — seed overflow)
  Micro Scaling Laws v2:      0 success / 4 error (NameError — cached old script)
  GPU experiments:             ~2 success / ~16 error (mixed: AssertionError, SyntaxError)
  Grokking/Lottery/Optimizer:  ~7 success (legacy, from Foxtrot-2)
  Double Descent v2:           1 success (DadOld-PC, 37530s elapsed)

BUGS FOUND AND FIXED
---------------------
1. feature_competition_dynamics_v2.py — SYNTAX ERROR
   The previous session's _seed_source fix was mangled: the _seed_source assignment
   was inserted INSIDE the hashlib.md5() call on lines 42-44, creating invalid Python:
     _seed = int(hashlib.md5(
     _seed_source = f"workunit:{_f}"          <-- WRONG: inside md5()
         _wu['experiment_name'].encode())...
   Fixed to:
     _seed = int(hashlib.md5(_wu['experiment_name'].encode())...)
     _seed_source = f"workunit:{_f}"          <-- CORRECT: separate line

2. representation_alignment_v2.py — SEED OVERFLOW (ValueError)
   net_seed = _seed * 1000 + config_idx * 100 + s
   When _seed ~ 2 billion (max from md5 hash), _seed * 1000 ~ 2 trillion,
   exceeding np.random.RandomState's limit of 2^32 - 1.
   Fixed by adding modular arithmetic:
     net_seed = (_seed * 1000 + config_idx * 100 + s) % (2**31)
     train_rng = np.random.RandomState((net_seed + 50000) % (2**31))
   Also removed duplicate _seed_source lines (harmless but messy).

3. micro_scaling_laws_v2.py — NOT ACTUALLY BROKEN
   The NameError results were from hosts that cached the pre-fix version.
   Current script on server is correct. Fresh deployment should work.

CREDIT AWARDED
--------------
Total: 192 credit to 5 users (well under 10,000 cap)
Credit tiers: success >300s=5, 60-300s=3, 10-60s=2, <10s=1;
              error >300s=3, 60-300s=2, else=1

  kotenok2000 (Hosts 29): 89 credit (53 results, 28 success/25 error)
  Coleslaw (Host 321): 42 credit (21 results, 11 success/10 error)
  ChelseaOilman (Hosts 319, 339): 40 credit (19 results, 10 success/9 error)
  Anandbhat (Hosts 219, 222): 12 credit (5 results, 3 success/2 error)
  Henk Haneveld (Host 57): 4 credit (2 results, 2 success)
  Steve Dodd (Host 85): 5 credit (1 result, 1 success — 37530s double_descent)

Website counters updated: credited_count=20817, total_results=20615

DEPLOYMENT
----------
Batch: s0303a
Total new workunits: 2,210
Target hosts: 76 active hosts (skipped: <6GB RAM, SSL issues, known broken)
All 2,210 results in server_state=2 (unsent, ready for dispatch)

Experiment distribution per host (fills idle cores):
  1 each of: compgen, featcompv2, repalignv2, microscalev2, memdynv2, curriculum
  Remaining cores filled with replications in priority order:
    compgen > featcompv2 > repalignv2 > microscalev2 > memdynv2 > curriculum

Largest deployments:
  Host 296 (epyc7v12, 240 cores): 240 WUs
  Host 287 (DESKTOP-N5RAJSE, 192 cores): 192 WUs
  Host 194 (7950x, 128 cores): 128 WUs
  Host 123 (Dads-PC, 80 cores): 80 WUs
  Host 141 (SPEKTRUM, 72 cores): 72 WUs

KEY SCIENTIFIC FINDINGS
-----------------------
1. COMPOSITIONAL GENERALIZATION (Finding #31): 9 new unique seeds from this batch.
   8/9 (89%) confirm wider networks = worse compositional generalization.
   New width-gap data (W32 / W64 / W128 mean gen gap):
     Seed 953933926: 0.720 / 0.718 / 0.730 (W128 > W32)
     Seed 1429180540: 0.625 / 0.641 / 0.615 (W128 < W32 — exception)
     Seed 540968166: 0.673 / 0.706 / 0.727 (monotonic increase)
     Seed 932352094: 0.691 / 0.697 / 0.705 (monotonic increase)
     Seed 1451859565: 0.560 / 0.572 / 0.578 (monotonic increase)
     Seed 349527514: 0.633 / 0.660 / 0.668 (monotonic increase)
     Seed 587730001: 0.646 / 0.683 / 0.697 (monotonic increase)
     Seed 1219793247: 0.673 / 0.706 / 0.727 (monotonic increase)
     Seed 1872457224: 0.635 / 0.649 / 0.670 (monotonic increase)
   Combined: 18 total seeds (9 prior + 9 new), 16/18 = 89% confirm.
   Mean gaps updated: W32=0.647, W64=0.664, W128=0.678 (monotonic, consistent).
   Approaching statistical significance. Next batch should push past 50 seeds.

2. MEMORIZATION DYNAMICS (Finding #26): ~10 new seeds, ALL showing 5/5 clean-first.
   Total now ~166 seeds. Finding is rock-solid.

3. CURRICULUM LEARNING (Finding #30): ~8 new seeds, ALL showing no benefit.
   Total now ~212 seeds. Finding is definitively confirmed.

4. FEATURE COMPETITION (Finding #27): ALL results were errors (broken script).
   Script now fixed. Fresh results expected from s0303a deployment.

5. REPRESENTATION ALIGNMENT (Finding #28): ALL results were errors (seed overflow).
   Script now fixed. Fresh results expected from s0303a deployment.

6. MICRO SCALING LAWS (Finding #29): Errors from cached old scripts only.
   Fresh results expected from s0303a deployment.

NEXT STEPS
----------
- Priority: Wait for s0303a batch results, especially from fixed featcompv2 and repalignv2
- If featcompv2/repalignv2/microscalev2 return clean data, analyze for multi-seed validation
- Continue building compgen seed count toward 50+ for strong statistical confirmation
- Consider new experiment direction once current open questions (#27-29, #31) are resolved
- Monitor GPU assertion errors — may need a defensive fix in memdynv2 data generation