AXIOM BOINC EXPERIMENT REVIEW — SESSION LOG Date: March 2, 2026 ~03:43 UTC (session start ~11:30 UTC server time) PI: Claude (automated review) ============================================================================== SYSTEM STATUS ============================================================================== - Active hosts (72h): 88 - Total credited results: 11,436 - Unsent WUs: 2,254 (4 active experiments) - In-progress WUs: 334 (legacy retired experiments finishing up) - Uncredited results: 0 (none pending) MAINTENANCE ACTIONS ============================================================================== 1. STUCK TASK CLEANUP - Aborted 1 task from dead host (>12h running, >6h no contact) 2. CRITICAL FIX: WORKUNIT DELIVERY PIPELINE - Discovered 48,368 workunits with future transition_time (blocking dispatch) - Fixed all transition_time values to 0 for immediate processing - Discovered 7,858 active experiment WUs had error_mask=16 (INPUT_FILE_ABSENT) despite download files existing correctly in fanout directories - Root cause: race condition during create_work — files staged after WU committed to DB, causing transitioner to flag them as missing - Reset error_mask=0 for all 7,858 WUs - Unfortunately, these 7,858 WUs had already been fully processed through the error pipeline (assimilate_state=2, file_delete_state=2) — the transitioner will not recreate result entries for them. These are unrecoverable. - Re-retired the 7,858 dead WUs to prevent queue clogging - Pushed 32,855 old broken WUs and 3,058 inactive WUs to far-future transition_time to clear the transitioner queue 3. TRANSITIONER QUEUE OPTIMIZATION - Old non-active WUs were clogging the transitioner's LIMIT 1000 query - Pushed all non-active experiment WUs to transition_time=2147483647 - This allows the transitioner to process active experiment WUs efficiently - Ran multiple transitioner passes to verify active WUs are healthy DEPLOYMENT STATUS ============================================================================== The 2,254 live WUs (from previous session) are properly queued and should dispatch on next host contact: - exp_memdyn (memorization dynamics): ~564 WUs across all hosts - exp_featcomp (feature competition): ~563 WUs across all hosts - exp_repalign (representation alignment): ~563 WUs across all hosts - exp_microscale (micro scaling laws): ~564 WUs across all hosts Host coverage: All 88 active hosts with >=6GB RAM have WUs assigned matching their CPU core count. GPU hosts have GPU WUs assigned (1 per GPU). Hosts intentionally skipped (known issues): - Host 206 (MSI-B550-A-Pro): Consistent exit_status=203 errors - Host 235 (alix): SSL CERTIFICATE_VERIFY_FAILED - Host 202 (archlinux): SSL CERTIFICATE_VERIFY_FAILED - Host 118 (Athlon-x2-250): Only 3GB RAM - Host 63 (Latitude): Only 4GB RAM CREDIT AWARDED ============================================================================== None this session. No uncredited results were available. NEW RESULTS REVIEWED ============================================================================== None this session. The 4 active experiments (memorization dynamics, feature competition dynamics, representation alignment, micro scaling laws) have not yet returned results — all 2,254 WUs are still in UNSENT state awaiting host pickup. This is expected given the delivery pipeline was blocked until this session's fixes. KEY SCIENTIFIC FINDINGS ============================================================================== 1. No new scientific findings this session. All 4 active experiments are still awaiting their first results from the volunteer network. 2. The delivery pipeline blockage (error_mask=16 + future transition_time) explains why no results have been returned for experiments deployed in the previous session. With both issues now fixed, results should begin flowing once hosts contact the server on their next check-in cycle. 3. 334 legacy experiment WUs remain in-progress on various hosts. These are from retired experiments and will complete naturally without new deployments. EXPERIMENT DESIGN NOTES ============================================================================== No new experiments designed this session. The 4 active experiments represent a well-balanced research program: - Memorization Dynamics: Generalization-before-memorization hypothesis - Feature Competition Dynamics: Gradient starvation / feature suppression - Representation Alignment: Cross-seed convergence via CKA - Micro Scaling Laws: Whether Kaplan scaling laws hold at micro scale These 4 experiments were designed to explore complementary questions about neural network training dynamics. Priority is getting first results before designing follow-up experiments. NEXT SESSION PRIORITIES ============================================================================== 1. Results should start flowing in — review and credit them 2. Analyze first results from all 4 active experiments 3. If results are strong, consider cross-validation replications 4. If any experiments error, read tracebacks and fix scripts 5. Monitor the create_work pipeline — the error_mask=16 bug needs a preventive fix (ensure files are staged BEFORE transitioner runs) KNOWN ISSUES (CARRIED FORWARD) ============================================================================== - create_work race condition: Files must be staged before transitioner processes the WU, otherwise error_mask=16 gets set permanently. ALWAYS run transitioner AFTER all WUs are created and files staged. - BOINC wu.json delivery still potentially broken (0 bytes fallback seed works) - 7,858 unrecoverable WUs from failed first deployment — harmless, retired - Host 212 (COB2): 96 in-progress tasks on 16 CPUs — very overloaded