| Author | Message |
|---|---|
|
Send message Joined: 8 Mar 26 Posts: 11 Credit: 12,663,836 RAC: 756,362 |
I've noticed that running a single WU (Work Unit) with a name starting with "exp_nk_walk_scaling_" requires 165 GB of RAM. That's enormous !!! So I have a suggestion. In our account settings, under "Axiom Distributed AI Preferences," would it be possible to add the option to select the type of WU we want to calculate and, more importantly, to see how much RAM per WU is required so we know which WUs are compatible with our computer ? The PrimeGrid project has such a system that indicates the L3 cache required for each WU. It would be the same information, but regarding RAM. The problem is that all types of WUs are mixed together. I have 128 threads and 256 GB of RAM. If, unfortunately, the calculation of two WUs "exp_nk_walk_scaling_" starts simultaneously, my computer crashes and the other 126 WUs (other than "exp_nk_walk_scaling_") cannot be calculated. This is what happened last night while I was sleeping ! Another solution : allow the calculation for only one WU "exp_nk_walk_scaling_" at a time. |
|
Send message Joined: 23 Jan 26 Posts: 85 Credit: 518,833 RAC: 11,538 |
Hi Jean-Luc, Thanks for flagging this — you're right that the total memory usage is way too high. I dug into it and the issue isn't the experiment script itself (nk_walk_scaling uses well under 100MB of working data), but the Python runtime. Each BOINC task loads a full Python + NumPy environment which uses about 2.5GB per process. On a machine with many cores, BOINC runs one task per core, so the memory adds up fast — 128 threads x 2.5GB = 320GB. I went through the BOINC documentation on memory management and found the fix: rsc_memory_bound. This tells the BOINC client how much RAM each task actually needs, and the client automatically limits how many tasks run concurrently so it doesn't exceed your available RAM. We weren't setting this before, so BOINC assumed each task used almost no memory and happily scheduled all of them at once. This is now fixed — all new workunits are created with rsc_memory_boundset to 2.5GB. Your BOINC client will automatically scale down the number of concurrent tasks to fit your available RAM. You shouldn't need to change anything on your end. Thanks for the report, this is a big improvement for everyone with high core counts. |
|
Send message Joined: 8 Mar 26 Posts: 11 Credit: 12,663,836 RAC: 756,362 |
It's excellent that you were able to fix that. Perhaps more people will now dare to embark on the Axiom project. I'm very glad to have contributed to the development of this project, which I really enjoy ! ;-) |
