AXIOM BOINC EXPERIMENT REVIEW — SESSION LOG
Date: March 2, 2026 ~03:43 UTC (session start ~11:30 UTC server time)
PI: Claude (automated review)
==============================================================================

SYSTEM STATUS
==============================================================================
- Active hosts (72h): 88
- Total credited results: 11,436
- Unsent WUs: 2,254 (4 active experiments)
- In-progress WUs: 334 (legacy retired experiments finishing up)
- Uncredited results: 0 (none pending)

MAINTENANCE ACTIONS
==============================================================================

1. STUCK TASK CLEANUP
   - Aborted 1 task from dead host (>12h running, >6h no contact)

2. CRITICAL FIX: WORKUNIT DELIVERY PIPELINE
   - Discovered 48,368 workunits with future transition_time (blocking dispatch)
   - Fixed all transition_time values to 0 for immediate processing
   - Discovered 7,858 active experiment WUs had error_mask=16 (INPUT_FILE_ABSENT)
     despite download files existing correctly in fanout directories
   - Root cause: race condition during create_work — files staged after WU committed
     to DB, causing transitioner to flag them as missing
   - Reset error_mask=0 for all 7,858 WUs
   - Unfortunately, these 7,858 WUs had already been fully processed through the
     error pipeline (assimilate_state=2, file_delete_state=2) — the transitioner
     will not recreate result entries for them. These are unrecoverable.
   - Re-retired the 7,858 dead WUs to prevent queue clogging
   - Pushed 32,855 old broken WUs and 3,058 inactive WUs to far-future
     transition_time to clear the transitioner queue

3. TRANSITIONER QUEUE OPTIMIZATION
   - Old non-active WUs were clogging the transitioner's LIMIT 1000 query
   - Pushed all non-active experiment WUs to transition_time=2147483647
   - This allows the transitioner to process active experiment WUs efficiently
   - Ran multiple transitioner passes to verify active WUs are healthy

DEPLOYMENT STATUS
==============================================================================

The 2,254 live WUs (from previous session) are properly queued and should
dispatch on next host contact:
  - exp_memdyn (memorization dynamics): ~564 WUs across all hosts
  - exp_featcomp (feature competition): ~563 WUs across all hosts
  - exp_repalign (representation alignment): ~563 WUs across all hosts
  - exp_microscale (micro scaling laws): ~564 WUs across all hosts

Host coverage: All 88 active hosts with >=6GB RAM have WUs assigned matching
their CPU core count. GPU hosts have GPU WUs assigned (1 per GPU).

Hosts intentionally skipped (known issues):
  - Host 206 (MSI-B550-A-Pro): Consistent exit_status=203 errors
  - Host 235 (alix): SSL CERTIFICATE_VERIFY_FAILED
  - Host 202 (archlinux): SSL CERTIFICATE_VERIFY_FAILED
  - Host 118 (Athlon-x2-250): Only 3GB RAM
  - Host 63 (Latitude): Only 4GB RAM

CREDIT AWARDED
==============================================================================
None this session. No uncredited results were available.

NEW RESULTS REVIEWED
==============================================================================
None this session. The 4 active experiments (memorization dynamics, feature
competition dynamics, representation alignment, micro scaling laws) have not
yet returned results — all 2,254 WUs are still in UNSENT state awaiting
host pickup. This is expected given the delivery pipeline was blocked until
this session's fixes.

KEY SCIENTIFIC FINDINGS
==============================================================================
1. No new scientific findings this session. All 4 active experiments are still
   awaiting their first results from the volunteer network.

2. The delivery pipeline blockage (error_mask=16 + future transition_time)
   explains why no results have been returned for experiments deployed in the
   previous session. With both issues now fixed, results should begin flowing
   once hosts contact the server on their next check-in cycle.

3. 334 legacy experiment WUs remain in-progress on various hosts. These are
   from retired experiments and will complete naturally without new deployments.

EXPERIMENT DESIGN NOTES
==============================================================================
No new experiments designed this session. The 4 active experiments represent
a well-balanced research program:

  - Memorization Dynamics: Generalization-before-memorization hypothesis
  - Feature Competition Dynamics: Gradient starvation / feature suppression
  - Representation Alignment: Cross-seed convergence via CKA
  - Micro Scaling Laws: Whether Kaplan scaling laws hold at micro scale

These 4 experiments were designed to explore complementary questions about
neural network training dynamics. Priority is getting first results before
designing follow-up experiments.

NEXT SESSION PRIORITIES
==============================================================================
1. Results should start flowing in — review and credit them
2. Analyze first results from all 4 active experiments
3. If results are strong, consider cross-validation replications
4. If any experiments error, read tracebacks and fix scripts
5. Monitor the create_work pipeline — the error_mask=16 bug needs a
   preventive fix (ensure files are staged BEFORE transitioner runs)

KNOWN ISSUES (CARRIED FORWARD)
==============================================================================
- create_work race condition: Files must be staged before transitioner processes
  the WU, otherwise error_mask=16 gets set permanently. ALWAYS run transitioner
  AFTER all WUs are created and files staged.
- BOINC wu.json delivery still potentially broken (0 bytes fallback seed works)
- 7,858 unrecoverable WUs from failed first deployment — harmless, retired
- Host 212 (COB2): 96 in-progress tasks on 16 CPUs — very overloaded