============================================================ AXIOM EXPERIMENT RESULTS — February 28, 2026 7:20 PM ============================================================ PREVIOUSLY RECORDED RESULT IDs (do not re-record these): 1509027, 1509028, 1509029, 1509030, 1509031, 1509034, 1509035, 1509036, 1509037, 1509039, 1509040, 1509041, 1509042 CREDITED RESULT IDs (do not re-credit these): 1509034 (10cr ChelseaOilman), 1509035 (15cr philip-in-hongkong), 1509036 (15cr philip-in-hongkong), 1509037 (75cr Coleslaw), 1509039 (50cr makracz), 1509040 (15cr makracz), 1509041 (30cr makracz), 1509042 (10cr makracz) SUMMARY ------- Total completed: 12 (11 successful, 1 failed) Total pending: 25 (5 sent to clients, 20 awaiting host check-in) Total failed: 1 (v6.02 GPU bcrypt corruption) Credit awarded this session: 220 RESULTS (ranked by scientific interest) --------------------------------------- 1. INFORMATION BOTTLENECK — Host: Widmo (32 CPUs, 123GB Linux) User: makracz Runtime: 85s Credit: 50 (Excellent — confirmed Tishby hypothesis) Findings: - Tishby's information bottleneck hypothesis SUPPORTED - All 4 hidden layers showed compression phase after initial fitting - Compression ratios: Layer 1=1.12x, Layer 2=1.66x, Layer 3=1.65x, Layer 4=2.76x - Layer 4 (deepest) showed strongest compression (peak I(X;T)=1.18 -> final=0.43) - 99.8% test accuracy, 100% train accuracy - Clean replication — publishable quality data Quality: Excellent 2. RESERVOIR COMPUTING — Host: Rosie (20 CPUs, 112GB Windows) User: Coleslaw Runtime: 299s (full 5 minutes) Credit: 75 (Excellent — clean scaling curves on chaotic prediction) Findings: - Echo state network on Lorenz attractor prediction - Clear scaling: MSE dropped from 0.378 (100 neurons, radius 0.5) to ~0.031 (500 neurons, radius 0.95) - Optimal spectral radius: 0.95-0.99 across all reservoir sizes - Independently confirms edge-of-chaos theory for reservoir computing - Higher radius (0.99) better than 0.95 for small reservoirs, converges for large - Grid: 4 reservoir sizes x 4 spectral radii = 16 configurations tested Quality: Excellent 3. MODE CONNECTIVITY — Host: SPEKTRUM (72 CPUs, 191GB Windows) User: makracz Runtime: ~30s Credit: 30 (Good — unexpected finding but models underfit) Findings: - Two models trained from different seeds: both only ~11% accuracy (10-class problem) - Models are nearly orthogonal in weight space (cosine similarity = -0.011) - 41 units apart in parameter distance - UNEXPECTED: linear interpolation midpoint (alpha=0.5) has LOWER loss (2.63) than either endpoint (4.54, 4.33) - Suggests wide flat valley in loss landscape even for underfit models - Models didn't converge well — need more epochs or better hyperparameters Quality: Good (interesting landscape finding despite underfitting) 4. CELLULAR AUTOMATA x2 — Host: philip-23-q145hk (4 CPUs, 8GB Linux) User: philip-in-hongkong Runtime: ~60s each Credit: 15 each (30 total) (Fair — negative result, GA plateaued) Findings: - GA evolved rules reaching fitness 0.455 on density classification - Trivial all-1 baseline: 0.516 — evolved rule did NOT beat it - Fitness history: [0.085, 0.085, 0.395, 0.425, 0.44, 0.455, 0.455, 0.455] - Plateaued after generation 6 of 8 - Best rule had 60.9% ones fraction - Negative result: density classification is genuinely hard for 1D CA with small populations - Needs more generations, larger population, or tournament selection Quality: Fair 5. DOUBLE DESCENT — Host: Widmo (32 CPUs, 123GB Linux) User: makracz Runtime: 28s Credit: 15 (Fair — didn't show expected phenomenon) Findings: - 10 widths from 5 to 5000 parameters - Train accuracy hits 100% quickly across all widths - Test accuracy terrible everywhere (3-9%) - No visible double descent curve - Missing ingredient: label noise (added in v2 redesign) - The interpolation threshold peak requires noisy labels to appear Quality: Fair (clean execution, wrong experimental design) 6. POWER LAW FORGETTING — Host: SPEKTRUM (72 CPUs, 191GB Windows) User: makracz Runtime: 33s Credit: 10 (Poor — uninformative result) Findings: - Zero forgetting detected in both naive SGD and EWC conditions - Task A accuracy remained 100% after learning Task B - EWC advantage: 0.0 (no difference) - Fisher information norms: all zeros - Root cause: Tasks A and B too separable — network solved both without interference - Redesigned in v2: overlapping input representations + bottleneck architecture Quality: Poor (valid execution, uninformative design) 7. EDGE OF CHAOS — Host: Dell-9520 (20 CPUs, 32GB Windows) User: ChelseaOilman Runtime: short Credit: 10 (Poor — only 2 data points) Findings: - Only tested spectral radii 0.1 and 0.2 - Both in ordered regime (negative Lyapunov exponents) - Radius 0.1: Lyapunov=-2.31, memory=3.08 - Radius 0.2: Lyapunov=-1.62, memory=4.53 - Never reached the edge of chaos (around radius ~1.0) - Likely hit time limit before exploring full range - Redesigned in v2: 30 radii from 0.1 to 3.0 Quality: Poor (incomplete — didn't reach the interesting regime) FAILED ------ - double_descent_gpu_host1 (v6.02): bcrypt.pyd decompression error — corrupted PyInstaller bundle Fixed in v6.03 by excluding bcrypt/paramiko/cryptography modules CREDIT LEDGER ------------- makracz (userid=80): 105 credit (4 experiments: 50+30+15+10) Coleslaw (userid=122): 75 credit (1 experiment: reservoir computing) philip-in-hongkong (108): 30 credit (2 experiments: 15+15) ChelseaOilman (userid=40): 10 credit (1 experiment: edge of chaos) PyHelix (userid=1): 0 credit (self — test runs excluded) TOTAL AWARDED: 220 credit REDESIGNED EXPERIMENTS DEPLOYED (v2) ------------------------------------- - edge_of_chaos_v2.py -> host 320 (Dell-9520, 20 CPUs) — 30 radii instead of 2 - power_law_forgetting_v2.py -> host 141 (SPEKTRUM, 72 CPUs) — overlapping tasks + bottleneck + EWC - double_descent_v2.py -> host 194 (7950x, 128 CPUs) — 15% label noise for interpolation peak NEW EXPERIMENTS DEPLOYED ------------------------ - learning_rate_phase_transitions.py -> host 287 (192 CPUs) + host 323 (32 CPUs) - neural_network_pruning_lottery.py -> host 296 (240 CPUs) - critical_learning_periods.py -> host 85 (80 CPUs) + host 87 (80 CPUs) - weight_initialization_landscape.py -> host 269 (64 CPUs) + host 123 (80 CPUs) - gradient_descent_loss_landscapes.py -> host 209 (32 CPUs) + host 143 (32 CPUs) + host 177 (32 CPUs) - benford_law_neural_weights.py -> host 253 (16 CPUs) + host 7 (16 CPUs) + host 322 (16 CPUs) CROSS-VALIDATION COPIES ------------------------ - reservoir_computing -> host 249 (replication of Rosie's excellent result) - information_bottleneck -> host 105 (replication of Widmo's excellent result) - mode_connectivity -> host 95 (replication, check if models converge better) - cellular_automata -> host 15 (replication on different hardware) RECOMMENDATIONS FOR NEXT BATCH ------------------------------- 1. PRIORITY: Wait for v2 redesigns (edge_of_chaos, power_law_forgetting, double_descent) — these should produce the results the originals missed 2. PRIORITY: Wait for new experiments (learning_rate_phase_transitions, critical_learning_periods) — these are the most scientifically interesting new designs 3. If information_bottleneck replicates on host 105, consider writing a more detailed version that tracks I(X;T) and I(T;Y) at more granular epoch intervals 4. If reservoir_computing replicates on host 249, extend to larger reservoirs (1000, 2000 neurons) on a high-RAM host 5. Mode connectivity needs models that actually converge — consider increasing epochs or using a simpler task 6. Cellular automata needs larger populations (500+) and more generations (100+) — send to a high-CPU host 7. Consider new experiment: "grokking_dynamics" — track weight norm, loss, and generalization during the grokking phase transition at very fine time resolution 8. Consider new experiment: "neural_tangent_kernel" — compare NTK predictions vs actual training dynamics at different widths