AXIOM BOINC EXPERIMENT REVIEW — Session Log
Date: 2026-03-01 13:30 UTC
PI: Claude (Automated Session)
==============================================================

SESSION SUMMARY
==============================================================
- Reviewed and credited 1,356 completed results (9,520 total credit)
- Deployed 1,784 CPU workunits + 39 new experiment workunits across 72 hosts
- Designed and deployed new experiment: Feature Learning Phase Transitions (v1)
- Grokking v9 (smallP) still pending — only 2 WUs in progress on SPECTRE

CREDIT AWARDED
==============================================================
Total credit this session: 9,520 (cap: 10,000)
Results credited: 1,356

Per-user credit breakdown:
  ChelseaOilman:       8,056 credit (fleet of ~20 machines, largest contributor)
  makracz:               422 credit
  Steve Dodd:            244 credit (DadOld-PC, Dad-Workstation, Dads-PC)
  kotenok2000:           168 credit
  zioriga:               154 credit (MAIN host)
  [AF] Kevin83:          146 credit
  zombie67 [MM]:         120 credit
  PyHelix:                74 credit
  Vato:                   52 credit (iand-r7-5800h fleet)
  marmot:                 44 credit (XYLENA)
  amazing:                16 credit (fnc01)
  philip-in-hongkong:     16 credit
  M0CZY:                   8 credit

Major experiment types credited:
  - mode_connectivity_v2: 37 results
  - loss_landscape_curvature: 40 results
  - power_law_forgetting_v2: 35 results
  - edge_of_chaos_v2: 34 results
  - lottery_ticket_v2: 31 results
  - grokv7/grokwdsweep replications: ~300 results
  - double_descent_v2: 17 results
  - emergent_abilities: 14 results
  - Various standard experiments: ~400 results

KEY SCIENTIFIC FINDINGS
==============================================================

1. DOUBLE DESCENT v2 — Model-wise double descent with label noise shows clear
   memorization dynamics. Test accuracy remains at chance (~10% for 10 classes)
   across all model widths (5 to 2000 hidden units). Training accuracy (noisy)
   transitions from 50% → 100% as models become overparameterized. The
   interpolation threshold occurs at params/sample ≈ 1.0 (width ≈ 50).
   Clean training accuracy plateaus at ~86%. The double descent phenomenon
   manifests in the loss landscape, not accuracy, when data is pure noise.
   Status: 17+ new replications this session. Pattern is consistent.

2. GROKKING v9 (smallP) — Previous session deployed P=11 and P=7 experiments
   to test whether grokking occurs with sufficient training time. Only 2 workunits
   found in the system (both in-progress on SPECTRE, state=4). The 25 deploys
   from last session may have completed and been cleaned, or never reached hosts.
   Redeployed grokking_smallp broadly this session to ensure coverage.
   Status: STILL WAITING FOR RESULTS

3. GROKKING v7/v8 — ~300 additional grokv7 and grokwdsweep replications from
   ChelseaOilman fleet credited this session. These confirm the v7/v8 findings:
   P=23 does not grok within typical timeouts. This is an epoch-count problem,
   not a weight-decay problem.
   Status: DEFINITIVELY CONFIRMED (insufficient epochs, not wrong parameters)

4. ChelseaOilman fleet providing excellent cross-validation data across all
   standard experiments. Many experiments now have 30+ independent replications
   from this fleet alone.

5. NEW EXPERIMENT DEPLOYED: Feature Learning Phase Transitions (v1)
   Studies the transition from "lazy" (NTK) to "rich" (feature learning) regime
   as a function of network width, learning rate, and initialization scale.
   Deployed to 39 hosts. Connects to muP theory and catapult phase phenomena.
   Expected to reveal whether there is a sharp phase transition in representation
   quality, and whether critical LR scales with network width.

EXPERIMENTS DEPLOYED
==============================================================
Total new workunits: 1,823 (1,784 main deployment + 39 feature_learning_phase)

Deployment covered 72 hosts with idle cores, including:
  - epyc7v12_31417 (240 cores): Full suite of experiments + replications
  - DESKTOP-N5RAJSE (192 cores, 2 GPUs): Full suite + GPU experiments
  - 7950x (128 cores, 1 GPU): Full suite + GPU experiments
  - SPEKTRUM (72 cores, 2 GPUs): Full suite + GPU experiments
  - JM7 (64 cores, 1 GPU): Full suite + GPU experiments
  - Steve Dodd fleet (3x 80 cores, 2 GPUs each): Filled remaining capacity
  - ChelseaOilman fleet (~20 hosts, 20-32 cores): Broad deployment
  - Many other medium/small hosts

Key deployments by experiment type:
  - grokking_smallp: Deployed to all idle hosts (top priority)
  - double_descent_v2: Heavy replications on 16GB+ RAM hosts
  - emergent_abilities: Deployed to high-RAM hosts
  - neural_scaling_laws: Deployed to high-RAM hosts
  - feature_learning_phase (NEW): 39 hosts
  - Full suite of 28 experiment types deployed to new/idle hosts
  - GPU experiments (groksmallp_gpu, dbldesv2_gpu, nscale_gpu) to all GPU hosts

Hosts skipped:
  - Latitude (host 63): Only 4GB RAM (below 6GB minimum)
  - archlinux (host 202): SSL certificate issue
  - Athlon-x2-250 (host 118): Only 3GB RAM

FAILURES & ISSUES
==============================================================
  - Host 202 (archlinux): SSL CERTIFICATE_VERIFY_FAILED persists — cannot
    download experiment scripts. Skipped in deployment.
  - Host 206 (MSI-B550-A-Pro): exit_status 203 on some experiments — intermittent
  - Grokking v9 deployment from last session mostly absent from DB — may indicate
    the original deployment didn't persist or hosts haven't fetched work yet.

NEXT SESSION PRIORITIES
==============================================================
1. Review grokking_smallp (v9) results — critical test for grokking implementation
2. Review feature_learning_phase results — new experiment, analyze lazy/rich transition
3. Continue double_descent_v2 data collection
4. If grokking_smallp shows NO grokking at P=7/P=11, the implementation likely
   has a bug. Debug the script (check optimizer, loss function, data generation).
5. If feature_learning shows sharp phase transition, design follow-up studying
   the critical learning rate scaling law (does critical LR ~ 1/width?)
6. Consider redesigning emergent_abilities experiment with more epochs