Axiom BOINC Session Log — s0303e (~07:12 UTC, March 3, 2026) ============================================================= DEPLOYMENT SUMMARY ================== CPU Deployment: 186 workunits to 9 hosts Host 123 (Dads-PC, 80 CPUs, 128GB): 49 WUs Host 320 (Dell-9520, 20 CPUs, 32GB): 40 WUs Host 87 (Dad-Workstation, 80 CPUs, 128GB): 24 WUs Host 137 (Note11Ste, 12 CPUs, 15GB): 19 WUs Host 29 (DESKTOP-P57624Q, 8 CPUs, 16GB): 16 WUs Host 1 (Pyhelix, 16 CPUs, 32GB): 14 WUs Host 345 (Andre-WEBK, 8 CPUs, 8GB): 8 WUs Host 334 (Golf-1, 32 CPUs, 30GB): 8 WUs Host 16 (dahyun, 32 CPUs, 32GB): 8 WUs Skipped: Host 63 (4GB), Host 118 (3GB) — insufficient RAM CPU experiment distribution (weighted): svd_rank_intervention (35%), percolation_scaling (20%), wd_lr_interaction (20%), wd_window_duration (10%), representation_crystallization (5%), regularization_mechanisms (5%), wd_rebound_dynamics (3%), bottleneck_mechanism (2%) GPU Deployment: 208 workunits to 78 hosts 26 hosts with 2 GPUs: 4 WUs each (104 total) 52 hosts with 1 GPU: 2 WUs each (104 total) GPU scripts deployed: wd_timing_scale_gpu.py, grad_subspace_wd_gpu.py (NEW) Skipped: Host 116 (7GB), Host 118 (3GB) — insufficient RAM GPU CHECKPOINT: - 78 GPU hosts identified with idle GPU capacity - 208 GPU workunits deployed (2 experiments x host count) - Scripts used: wd_timing_scale_gpu.py (finding #54), grad_subspace_wd_gpu.py (NEW finding #55) TOTAL: 394 workunits (186 CPU + 208 GPU) NEW EXPERIMENT: Gradient Subspace Dimension and WD Timing (#55) =============================================================== Script: grad_subspace_wd_gpu.py (GPU-accelerated) Novelty check (MANDATORY 4A): 4A-1. Literature search queries: - "gradient subspace dimensionality during neural network training" - "gradient subspace dimension weight decay timing" - "Hessian eigenspectrum weight decay timing" - "loss landscape curvature regularization timing" 4A-2. What was found: - Gur-Ari et al. (2018) "Gradient Descent Happens in a Tiny Subspace": Gradients collapse into top-k Hessian eigenvectors. Well-established finding. - Ghorbani et al. (2019): Large isolated Hessian eigenvalues appear rapidly; batch norm suppresses them. - Xie & Sato (2021) "Understanding and Scheduling Weight Decay": WD accelerates convergence in near-zero Hessian eigenvalue subspace, but does NOT connect to gradient subspace dim. - Galanti et al. (2022) "SGD and WD Secretly Minimize Rank": SGD+WD induces low-rank bias on weight matrices. Different from gradient subspace. - Cohen et al. (2021) "Edge of Stability": Progressive sharpening then oscillation. NOT connected to WD timing. - AdaDecay (2019): Per-parameter adaptive WD based on gradient norms, but not subspace dim. 4A-3. Novel angle: NO prior work connects gradient subspace dimensionality to weight decay timing. The pieces exist in isolation: (a) gradient subspace collapse is well-measured, (b) WD timing matters (our findings #44-45), but (c) nobody has linked them. This experiment provides the causal bridge — testing whether gradient subspace collapse PREDATES the optimal WD onset, explaining WHY late WD works. Hypothesis: WD becomes effective precisely when the gradient subspace has collapsed into a low-rank manifold. Early WD disrupts high-dimensional exploration; late WD regularizes the (large) complement of the (small) learning subspace for "free." Method: Width-512 MLP, 4 WD conditions, measure effective gradient rank every 10 epochs via participation ratio of output-layer gradient covariance eigenvalues. KEY SCIENTIFIC FINDINGS ======================= 1. WD TIMING SCALE DEPENDENCE (Finding #54): 2 successful GPU results (RTX 4070 Ti). PRELIMINARY: Inverse CP does NOT consistently hold at all network widths. Late WD effectiveness: 67% at width-64, 5% at 256, 25% at 512, 75% at 1024, 0% at 2048. Second seed shows different pattern: 600% at 64, 0% at 256, -27% at 512, 63% at 1024, 85% at 2048. Highly noisy with only 2 seeds. MANY GPU hosts failing with CUDA_ERROR_NO_BINARY_FOR_GPU (architecture mismatch). Only host 330 (RTX 4070 Ti) succeeded. Need more diverse seeds from hosts with compatible CUDA architectures. 2. GRADIENT SUBSPACE WD TIMING (Finding #55): NEW experiment deployed this session. 208 GPU workunits sent to 78 hosts. First results expected within ~30 minutes. This is the first experiment to directly test the gradient subspace collapse → WD timing connection identified as a literature gap. 3. CPU experiments continue collecting diverse-seed data for: - SVD rank intervention (#51): Causal mechanism test, needs diverse seeds - Percolation phase transitions (#52): Cross-disciplinary statistical physics - WD x LR interaction (#49): Confirmed interaction, needs diverse seeds - WD window duration (#50): Seed bug fixed, collecting diverse results KNOWN ISSUES ============ - GPU CUDA arch mismatch: Most GPU hosts fail with CUDA_ERROR_NO_BINARY_FOR_GPU. Only ~22 hosts have successfully completed GPU experiments. The bundled CuPy NVRTC cannot JIT-compile for newer GPU architectures (Blackwell sm_120+). This requires a binary rebuild with newer CUDA toolkit — outside scope of this deployment script. - Large backlog of 8000+ unsent WUs from old/retired experiments. Low priority — BOINC scheduler deprioritizes them vs newer targeted work. NEXT SESSION PRIORITIES ======================= 1. Analyze first grad_subspace_wd_gpu results — check if gradient subspace collapse timing correlates with WD effectiveness timing 2. Continue collecting wd_timing_scale results from compatible GPU hosts 3. If gradient subspace experiment shows signal, design follow-up with multiple hidden widths to test scale dependence of the collapse → WD timing relationship 4. Consider retiring old unsent WUs to clean up the work queue