Message boards : Graphics cards (GPUs) : Python apps for GPU hosts errors
Author | Message |
---|---|
I'm fairly certain I've been running these "Python apps for GPU hosts" successfully before. Now I see 85-90% with "Error while computing status". If I check I am one of 4-8 others with same status, although not necessarily the same underlying error. Define learner Created Learner. Look for a progress_last_chk file - if exists, adjust target_env_steps Define train loop Traceback (most recent call last): File "C:\ProgramData\BOINC\slots\3\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 196, in get_data self.next_batch = self.batches.__next__() StopIteration During handling of the above exception, another exception occurred: Traceback (most recent call last): Last in the traceback is the following, which I'm not sure if it is the original exception or not. If it is can I adjust max_split_size_mb (how and where) and what is a good value for it? RuntimeError: CUDA out of memory. Tried to allocate 202.00 MiB (GPU 0; 2.00 GiB total capacity; 1.23 GiB already allocated; 0 bytes free; 1.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 19:02:23 (17760): python.exe exited; CPU time 1095.984375 Thoughts, suggestions... Thanks in advance. | |
ID: 59861 | Rating: 0 | rate: / Reply Quote | |
the answer is simple: | |
ID: 59864 | Rating: 0 | rate: / Reply Quote | |
GPUs with only 2 GB VRAM are too small for processing Python tasks. Ok sure. Then what has changed since I've been running these tasks successfully prior to 2 (or maybe 3) weeks ago? My system is the same. Did I miss something? | |
ID: 59865 | Rating: 0 | rate: / Reply Quote | |
The latest series of 1000 tasks uses more VRAM as posted by the researcher. | |
ID: 59866 | Rating: 0 | rate: / Reply Quote | |
The latest series of 1000 tasks uses more VRAM as posted by the researcher. Thanks, figures... I've missed something. Is there a link or forum post from the researcher that you could point me to? | |
ID: 59867 | Rating: 0 | rate: / Reply Quote | |
GPUs with only 2 GB VRAM are too small for processing Python tasks. about 3 weeks ago, ACEMD3 tasks were distributed for a while, but no Pythons. Maybe you crunched ACEMD3 tasks at that time? They do not nearly need as much VRAM as the Pythons do. | |
ID: 59868 | Rating: 0 | rate: / Reply Quote | |
The latest series of 1000 tasks uses more VRAM as posted by the researcher. hm, that's strange - here it seems to happen the other way round: from what I can see e.g. on my Quadro P5000: with 4 Pythons running concurrently, before VRAM use was nearly 16GB, now it's below 12 GB. | |
ID: 59869 | Rating: 0 | rate: / Reply Quote | |
The latest series of 1000 tasks uses more VRAM as posted by the researcher. Most recently, 4 Pythons running concurrently on the P5000 use roughly 9,8 GB VRAM - so it's becoming less all the time | |
ID: 59870 | Rating: 0 | rate: / Reply Quote | |
The latest series of 1000 tasks uses more VRAM as posted by the researcher. you should check at which stage of running the tasks are on for a more insightful picture on what's happening. when the task fist starts, for about the first 5 minutes, its only extracting the archive. it will use no VRAM during this time. from about 5 minutes to 10minutes or so, it will use a reduced amount. 2-3GB. then after 10-15mins or so, it gets to the main process and will use the full VRAM amount. about 3-4GB. so far I have noticed two main sizes with the new batches. I have some tasks using about 3GB (which is the same as a few weeks ago) and some tasks using about 4GB which lines more with the recent tasks. I have not noticed any key indicator in the file names to determine which tasks are using the lower VRAM and which are using more. ____________ | |
ID: 59871 | Rating: 0 | rate: / Reply Quote | |
about 3 weeks ago, ACEMD3 tasks were distributed for a while, but no Pythons. Fair enough as I can't say for sure which Application(s) was(/were) shown in my GPUGRID task list. I think my assumption was based on the processes seen in (Windows) Task Manager, where I would see dozens of Python processes when processing a GPUGRID task. i.e. maybe other applications than "Python apps for GPU hosts" use Python(?) And, going back further than 3 weeks, never until now have I seen so many tasks failing. By luck I've processed one "Python apps for GPU hosts" task overnight and another is currently running longer than the usual failure point. It still would be nice to see a link or forum post from researcher(s) with requirements and release notes for the applications. | |
ID: 59872 | Rating: 0 | rate: / Reply Quote | |
you should check at which stage of running the tasks are on for a more insightful picture on what's happening. on the Quadro P5000, the status of this moment is as follows: task 1: 82% - 19:58 hrs task 2: 31% - 7:09 hrs task 3: 14% - 2:43 hrs task 4: 22% - 4:36 hrs VRAM use: 9.834 MB- and this even includes a few hundred MB for the monitor. | |
ID: 59873 | Rating: 0 | rate: / Reply Quote | |
All the pertinent information about the Python tasks is always posted in the main thread in News. https://www.gpugrid.net/forum_thread.php?id=5233 This statement about the memory reduction for the next series is here. https://www.gpugrid.net/forum_thread.php?id=5233&nowrap=true#59838 | |
ID: 59874 | Rating: 0 | rate: / Reply Quote | |
This statement about the memory reduction for the next series is here. From what I can see on all my hosts which crunch Pythons: the VRAM requirement of the recent tasks has dropped considerably | |
ID: 59875 | Rating: 0 | rate: / Reply Quote | |
Maybe this is some change affecting windows only. All my tasks are still using 3-4GB each. | |
ID: 59876 | Rating: 0 | rate: / Reply Quote | |
I'm still seeing 3-4GB each for the Python tasks also on my Linux Ubuntu hosts. | |
ID: 59877 | Rating: 0 | rate: / Reply Quote | |
Maybe this is some change affecting windows only. No guys, my Windows hosts are using the same ~4GBs graphics mem on the latest released WUs. Earlier I've noticed some "exp" tasks have used over 6GB, so there must be a variance among tasks. I wonder if he saw some ACEMD tasks go through and mistook them for PythonGPUs maybe. Running a PythonGPU on 2GB seems almost impossible to me. ____________ "Together we crunch To check out a hunch And wish all our credit Could just buy us lunch" Piasa Tribe - Illini Nation | |
ID: 59878 | Rating: 0 | rate: / Reply Quote | |
you should check at which stage of running the tasks are on for a more insightful picture on what's happening. The 4 Pythons which have been running for several hours ea. right now are using even less VRAM then the ones reported above from 2 days ago - total VRAM use is 8.840 MB. So there seems to be quite some variance between these Pythons. | |
ID: 59880 | Rating: 0 | rate: / Reply Quote | |
I don't believe your numbers. Whatever utility you are using in Windows is not reporting correctly or more likely you are interpreting what it displays or looking at the wrong numbers. | |
ID: 59881 | Rating: 0 | rate: / Reply Quote | |
I don't believe your numbers. Whatever utility you are using in Windows is not reporting correctly or more likely you are interpreting what it displays or looking at the wrong numbers. The utility I use is GPU-Z. So, maybe it indeed shows wrong figures, I cannot tell for sure, of course. As already said in another thread about a week ago: nvidia-smi unfortunately does not function here, no idea why. There is a problem "access denied" | |
ID: 59882 | Rating: 0 | rate: / Reply Quote | |
There must be some way to run the command in a Windows terminal with elevated rights. | |
ID: 59884 | Rating: 0 | rate: / Reply Quote | |
There must be some way to run the command in a Windows terminal with elevated rights. Thanks, Keith, for providing the link above. As explained in this posting: https://www.gpugrid.net/forum_thread.php?id=5233&nowrap=true#59832 I tried to apply the tool with admin rights - but still it did not work. Maybe something is wrong with my installation, or nvidia-smi is defective, or whatever ... BTW: right now, the 4 Pythons running conurrently on the Quadro P5000 are using exactly 12.000 MB VRAM. So VRAM usage really seems to vary quite much. | |
ID: 59888 | Rating: 0 | rate: / Reply Quote | |
but only you seem to be reporting exceptionally low VRAM use at times. which points to something else going on with your system specifically. either incorrect readings or something not working in the way you think. no one else reports this level of variance. mine have been pretty consistent, some use ~3GB and some use ~4GB, and nothing else. and that seems to align with what others are reporting as well. | |
ID: 59889 | Rating: 0 | rate: / Reply Quote | |
It seems 'abou' has tweaked these WUs. It is even using lesser RAM and VRAM but you will have to ask 'abou'. Otherwise, it is all conjecture and physical monitoring. | |
ID: 59890 | Rating: 0 | rate: / Reply Quote | |
It seems 'abou' has tweaked these WUs. It is even using lesser RAM and VRAM but you will have to ask 'abou'. Otherwise, it is all conjecture and physical monitoring. Very true. As this is an ongoing project in a new and developing science I think we volunteer research assistants can help some by "tuning" our hosts to best take advantage of what changes we see happening in the tasks. Abouh has been good about communicating and informing us of developments over on the news thread. That information has been enhanced by us communicating among ourselves what we've seen happening on our hosts, Linux or Windows. "Stay tuned" seems appropriate here. (IMHO) This is not a project which you can just set your host to compute and then ignore (F@H). This is a fun challenge and learning experience, at least for me. | |
ID: 59892 | Rating: 0 | rate: / Reply Quote | |
...(IMHO) yes, how right you are :-) not at all "set and forget" | |
ID: 59893 | Rating: 0 | rate: / Reply Quote | |
Speaking of GPU tuning, does anybody know what the difference is between NVIDIA's 'gaming' and 'studio' drivers and if one is better suited for this sort of duty than the other? | |
ID: 59894 | Rating: 0 | rate: / Reply Quote | |
It seems 'abou' has tweaked these WUs. It is even using lesser RAM and VRAM but you will have to ask 'abou'. Otherwise, it is all conjecture and physical monitoring. ________________ Correct and Abou, has been very good at interacting with us and solving problems. There are however some WUs that are chewing up my 16 GB of RAM, not VRAM. VRAM usage seems quite feasible. | |
ID: 59895 | Rating: 0 | rate: / Reply Quote | |
the tasks use about 10GB of system RAM per task. you should account for this. | |
ID: 59896 | Rating: 0 | rate: / Reply Quote | |
There are however some WUs that are chewing up my 16 GB of RAM I have noticed that early in the run my windows hosts will show a brief peak of RAM usage around 15.6 GBs or so. I think it might be during the unpacking and expansion phase but don't take my guess as fact. I had to give the BOINC manager access to 95% of the available RAM to get though this part and go on to the ~60GBs of commit charge memory part where the factories are running. My swap file is user set at 55GB for now. During the factory phase it drops to the same ~10GB Ian reported for Linux. Incidentally, my observations are from watching the Afterburner hardware monitor. | |
ID: 59897 | Rating: 0 | rate: / Reply Quote | |
10GBs for the WU, some GBs for the System. The 15+GBs quote of Pops is correct. I shut down everything else to get through this phase, but not all WUs are doing this. | |
ID: 59898 | Rating: 0 | rate: / Reply Quote | |
I have noticed that early in the run my windows hosts will show a brief peak of RAM usage around 15.6 GBs or so. The last 48 hrs of crunching Pythons has only used ~12GB RAM max. The spike in windows memory usage hasn't appeared on any of my hosts. Good work abouh. | |
ID: 59911 | Rating: 0 | rate: / Reply Quote | |
Hello: My tasks are all failing after 3 or 4 minutes of execution, both in Windows 10 Pro and Linux Ubuntu 22.04 my AMD 3500 CPU and GTX 780ti GPU and 16GB RAM...??? some information. Thank yo. | |
ID: 60080 | Rating: 0 | rate: / Reply Quote | |
If you look at your failed tasks result outputs, the explanation is self-evident. [W NNPACK.cpp:79] Could not initialize NNPACK! Reason: Unsupported hardware. Not high enough CUDA capability and not even a high enough driver. The application name tells you the minimum CUDA level. Python apps for GPU hosts v4.03 (cuda1131) Best to utilize these gpus on other projects with lesser requirements. | |
ID: 60081 | Rating: 0 | rate: / Reply Quote | |
If you look at your failed tasks result outputs, the explanation is self-evident. Hello: Thank you for your prompt response, too bad because I had worked a lot on this project before, we'll see later if I can change the GPU. ____________ http://stats.free-dc.org/cpidtagb.php?cpid=b4bdc04dfe39b1028b9c5d6fef3082b8&theme=9&cols=1 | |
ID: 60082 | Rating: 0 | rate: / Reply Quote | |
If you look at your failed tasks result outputs, the explanation is self-evident. Hello: What would be the minimum type of NVIDIA card to be able to execute this project, thanks. ____________ http://stats.free-dc.org/cpidtagb.php?cpid=b4bdc04dfe39b1028b9c5d6fef3082b8&theme=9&cols=1 | |
ID: 60088 | Rating: 0 | rate: / Reply Quote | |
I think Maxwell based cards. GTX 900 series and newer. | |
ID: 60089 | Rating: 0 | rate: / Reply Quote | |
.. | |
ID: 60090 | Rating: 0 | rate: / Reply Quote | |
I think Maxwell based cards. GTX 900 series and newer. Hello: Thanks, I have a GTX 1080 ti in sight, so this would work. ____________ http://stats.free-dc.org/cpidtagb.php?cpid=b4bdc04dfe39b1028b9c5d6fef3082b8&theme=9&cols=1 | |
ID: 60092 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : Python apps for GPU hosts errors