Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU
Author | Message |
---|---|
Hello GPUGRID! | |
ID: 60963 | Rating: 0 | rate: / Reply Quote | |
When can we expect to start getting these new tasks? | |
ID: 60964 | Rating: 0 | rate: / Reply Quote | |
Now, if you are using Linux and have "run test applications?" selected | |
ID: 60965 | Rating: 0 | rate: / Reply Quote | |
When can we expect to start getting these new tasks? They are being distributed RIGHT NOW. The first 6 WU have arrived here. | |
ID: 60966 | Rating: 0 | rate: / Reply Quote | |
I only get "No tasks sent". | |
ID: 60967 | Rating: 0 | rate: / Reply Quote | |
Steve, GLaDOS:~$ nvidia-smi in bold, both processes running on the same GPU. ____________ | |
ID: 60968 | Rating: 0 | rate: / Reply Quote | |
also, could you please add explicit QChem for GPU selections in the project preferences page? currently it is only possible to get this app if you have ALL apps selected + test apps. I want to exclude some apps but still get this one. | |
ID: 60969 | Rating: 0 | rate: / Reply Quote | |
Ah yes thank you for confirming this! This is an omission in the scripts from my end. My test machine has one GPU so I missed it. This can be fixed thank you. | |
ID: 60970 | Rating: 0 | rate: / Reply Quote | |
I will try and get the web interface updated but this will take longer due to my unfamiliarity with it. Thanks | |
ID: 60971 | Rating: 0 | rate: / Reply Quote | |
just a hunch but I think the problem is with your export command in the run.sh export CUDA_VISIBLE_DEVICES=$CUDA_DEVICE which if I'm reading it right, will set all visible devices to just one GPU. this will have a bad impact for any other tasks running in the BOINC environment i think. normally on my 4x GPU system, I have CUDA_VISIBLE_DEVICES=0,1,2,3, and if you override that to just the single CUDA device it seems to shuffle all tasks there instead. ____________ | |
ID: 60972 | Rating: 0 | rate: / Reply Quote | |
just a hunch but I think the problem is with your export command in the run.sh I guess this wasnt the problem after all :) I see a new small batch went out and i downloaded some and they are working fine now. ____________ | |
ID: 60973 | Rating: 0 | rate: / Reply Quote | |
just a hunch but I think the problem is with your export command in the run.sh Hello, Can you confirm the latest WUs are getting assigned to different GPUs in the way you would expect? The line in the script you have mentioned is actually the fix I just did. In the first round I had forgotten to put this line. When the boinc client runs the app via the wrapper mechanism it specifies the gpu device which we capture in the variable CUDA_DEVICE. The Python CUDA code in our app uses the CUDA_VISIBLE_DEVICES variable to choose the GPU. When it is not set (as in the first round of jobs) it defaults to zero. So all jobs end up on GPU zero. With this fix the WUs will be run on the device specified by the boinc client. | |
ID: 60974 | Rating: 0 | rate: / Reply Quote | |
yup. I just ran 4 tasks on the same 4-GPU system and each one went to a different GPU as it should. | |
ID: 60975 | Rating: 0 | rate: / Reply Quote | |
Thanks very much for the help! | |
ID: 60976 | Rating: 0 | rate: / Reply Quote | |
also, does this app make much use of FP64? I'm noticing very fast runtimes on a Titan V, even faster than something like a RTX 3090. the titan V is slower in FP32, but like 14x faster in FP64. | |
ID: 60977 | Rating: 0 | rate: / Reply Quote | |
Yes this app does make use of some double precision arithmetic. High precision is needed in QM calculations. The bulk of the crunching is done by Nvidia's cusolver library which I believe uses tensor cores when available. | |
ID: 60978 | Rating: 0 | rate: / Reply Quote | |
Awesome, thanks for that info. | |
ID: 60979 | Rating: 0 | rate: / Reply Quote | |
Yes we will restart the large scale test next week! | |
ID: 60980 | Rating: 0 | rate: / Reply Quote | |
+1 | |
ID: 60981 | Rating: 0 | rate: / Reply Quote | |
+1 | |
ID: 60982 | Rating: 0 | rate: / Reply Quote | |
Sending out work for this app today. The work units take an hour (very approximately). They should be using different GPUs on multigpu systems. Please let me know if you see anything not working as you would normally expect | |
ID: 60984 | Rating: 0 | rate: / Reply Quote | |
Everything working as expected at my hosts. | |
ID: 60985 | Rating: 0 | rate: / Reply Quote | |
Steve, so far the first few tasks are completing and being validated for me on single and multi-GPU systems. | |
ID: 60986 | Rating: 0 | rate: / Reply Quote | |
My host is an R9-3900X, RTX 3070-Ti running ubuntu 20.04.06 LTS but it doesn't receive Quantum chemistry work units. I selected it in the preferences, test work and "ok to send work of other subprojects". Did I miss anything? | |
ID: 60987 | Rating: 0 | rate: / Reply Quote | |
My host is an R9-3900X, RTX 3070-Ti running ubuntu 20.04.06 LTS but it doesn't receive Quantum chemistry work units. I selected it in the preferences, test work and "ok to send work of other subprojects". Did I miss anything? I had the same problem until I ticked every available application for the venue, resulting in "(all applications)" showing on the confirmation page. Having cleared that hurdle, I note that the tasks are estimated to run for 1 minute 36 seconds (slower device) and 20 seconds (fastest device). The machines have most recently been running ATMbeta (Python) tasks, and have been left with "Duration Correction Factors" of 0.0148 and 0.0100 as a result. The target value should be 1.0000 in all cases. Please could keep an eye on the <rsc_fpops_est> value for each workunit type, to try and minimise these large fluctuations when new applications are deployed? | |
ID: 60988 | Rating: 0 | rate: / Reply Quote | |
Drago, | |
ID: 60989 | Rating: 0 | rate: / Reply Quote | |
Sending out work for this app today. The work units take an hour (very approximately). They should be using different GPUs on multigpu systems. Please let me know if you see anything not working as you would normally expect at least one of my computers is unable to get any tasks. the scheduler just reports that there are no tasks sent. it's inexplicable since it is the exact same configuration as a system that is receiving tasks just fine. they are both on the same venue. and that venue has ALL projects selected, and has both test/beta apps allowed, and both have allow other apps selected. not sure what's going on here. the only difference is one has 4 GPUs and the other has 7. will get work: https://gpugrid.net/show_host_detail.php?hostid=582493 will not get work: https://gpugrid.net/show_host_detail.php?hostid=605892 ____________ | |
ID: 60990 | Rating: 0 | rate: / Reply Quote | |
Yeah! I got all boxes checked but I still don't get work. Maybe it is a problem with the driver? I have version 470 installed which worked fine for me so far... | |
ID: 60991 | Rating: 0 | rate: / Reply Quote | |
Ok thanks for this information. There must be something unexpected going on with the scheduler. | |
ID: 60992 | Rating: 0 | rate: / Reply Quote | |
I made a couple tests with these new PYSCFbeta tasks. | |
ID: 60993 | Rating: 0 | rate: / Reply Quote | |
Stopping and resuming is not currently implemented. It will just restart from the beginning. | |
ID: 60994 | Rating: 0 | rate: / Reply Quote | |
are you able to inspect the scheduler log from this host? can you see more detail about the specific reason it was not sent any work? the only thing i see on my end is "no tasks sent" with no reason. ____________ | |
ID: 60995 | Rating: 0 | rate: / Reply Quote | |
I have the same problem too: no tasks sent! | |
ID: 60996 | Rating: 0 | rate: / Reply Quote | |
Here is a tale of 2 computers, one that was getting units, and the other was not. | |
ID: 60997 | Rating: 0 | rate: / Reply Quote | |
another observation is keep an eye on your CPU use. | |
ID: 60998 | Rating: 0 | rate: / Reply Quote | |
Thanks for listing the host ids that are not receiving. I can see them in the scheduler logs so hopefully can pin point why they are not getting work. | |
ID: 61000 | Rating: 0 | rate: / Reply Quote | |
i think if you get a discrete check box selection in the project preferences for QChem on GPU, that will solve the issues of requesting work for this project. | |
ID: 61001 | Rating: 0 | rate: / Reply Quote | |
Thank you to whoever got the discrete checkbox implemeted in the settings :). this should make getting work less trivial. | |
ID: 61002 | Rating: 0 | rate: / Reply Quote | |
The app will now appear in the GPUGRID preferences:"Quantum chemistry on GPU (beta)" | |
ID: 61003 | Rating: 0 | rate: / Reply Quote | |
Here is a tale of 2 computers, one that was getting units, and the other was not. I am getting tasks on both computers, now. So far, all tasks are completing successfully. | |
ID: 61004 | Rating: 0 | rate: / Reply Quote | |
Yes, thanks! this works much better now. | |
ID: 61005 | Rating: 0 | rate: / Reply Quote | |
The most recent WUs are just twice the size of the previous test set. There Are 100 molecules in each WU now, previously there were 50. | |
ID: 61006 | Rating: 0 | rate: / Reply Quote | |
oh ok, that explains it! | |
ID: 61007 | Rating: 0 | rate: / Reply Quote | |
The most recent WUs are just twice the size of the previous test set. There Are 100 molecules in each WU now, previously there were 50. I wouldn't complain if the credit per task was also doubled. ;) | |
ID: 61008 | Rating: 0 | rate: / Reply Quote | |
Steve, | |
ID: 61009 | Rating: 0 | rate: / Reply Quote | |
I made fresh Linux Mint installation and it is OK for me now. Now I can dowload new WU. | |
ID: 61010 | Rating: 0 | rate: / Reply Quote | |
The app will not work on GPUs with compute capability less than 6.0. It should not be sending them to these cards but I think at the moment this functionality is not working properly. WUs are being sent to GPUs like GTX 960 (cc=5.2, 2 GB VRAM) and they fail. E.g., https://www.gpugrid.net/show_host_detail.php?hostid=550055 https://developer.nvidia.com/cuda-gpus | |
ID: 61011 | Rating: 0 | rate: / Reply Quote | |
The app will not work on GPUs with compute capability less than 6.0. It should not be sending them to these cards but I think at the moment this functionality is not working properly. Steve mentioned that the scheduler blocks from low CC cards wasnt working properly. best to uncheck QChem for GPU in your project preferences for those hosts. Edit: Sorry, disregard, I thought you were talking about your own host. Since that host is anonymous, not really anything to be be done at the moment. will just have to deal with the resends. ____________ | |
ID: 61012 | Rating: 0 | rate: / Reply Quote | |
When you send out WUs with 0.991C + 1NV BOINC does not assign a CPU core to that task. You should designate them 1C. | |
ID: 61013 | Rating: 0 | rate: / Reply Quote | |
The 1,92GB file downloads with only ~19,75KBps. | |
ID: 61014 | Rating: 0 | rate: / Reply Quote | |
Steve, I've got this issue also. We need to be able to download multiple tasks in one request otherwise the GPU sits idle or grabs a backup project and thus will miss multiple requests until that task completes. | |
ID: 61015 | Rating: 0 | rate: / Reply Quote | |
When you send out WUs with 0.991C + 1NV BOINC does not assign a CPU core to that task. You should designate them 1C. You can always override that with an app_config.xml file in the project folder and assign 1.0 cpu threads to the task. | |
ID: 61016 | Rating: 0 | rate: / Reply Quote | |
Hello. The 1,92GB file downloads with only ~19,75KBps. I'm also encountering the issue of slow downloads on several hosts. It would be nice if the project infrastructure worked a little bit faster on our downloads. Thank you. ____________ - - - - - - - - - - Greetings, Jens | |
ID: 61017 | Rating: 0 | rate: / Reply Quote | |
Hello. once this file is downloaded, you dont need to download it again. it's re-used for every task. the input files sent for each task are very small and will download quickly. ____________ | |
ID: 61018 | Rating: 0 | rate: / Reply Quote | |
Here is a tale of 2 computers, one that was getting units, and the other was not. After running these tasks successfully for almost a day on both of my computers, now my BOINC manager, task tab, Remaining (estimated) "time" is telling approximately 24 days to complete on one computer and 62 days on the other, at the task's beginning, and incrementally counts down from there. The task actually completes successfully in a little over an hour. A few hours ago, they were showing the correct times to complete. Everything else is working fine, but this is definitely unusual. Did anyone else observed this? | |
ID: 61019 | Rating: 0 | rate: / Reply Quote | |
Good morning. | |
ID: 61020 | Rating: 0 | rate: / Reply Quote | |
Good morning. The project admin said at the beginning that the application will only work for cards with compute capability of 6.0 or greater. This equates to cards of Pascal generation and newer. Your GTX 970 is Maxwell with a compute capability of 5.2. It is too old for this app. ____________ | |
ID: 61021 | Rating: 0 | rate: / Reply Quote | |
Okay ... the answer was in the first message:
Désolé pour le dérangement ;) ____________ PC are like air conditioning, they becomes useless when you open Windows (L.T) In a world without walls and fences, who needs windows and gates? | |
ID: 61022 | Rating: 0 | rate: / Reply Quote | |
OMG, LOL, I love this and must go abuse it... PC are like air conditioning, they becomes useless when you open Windows (L.T) | |
ID: 61023 | Rating: 0 | rate: / Reply Quote | |
When you send out WUs with 0.991C + 1NV BOINC does not assign a CPU core to that task. You should designate them 1C. I know I can. What about the many people that leave BOINC on autopilot? I've seen multiple instances of 5 errors before a WU got to me. It's in Steve's best interest. | |
ID: 61026 | Rating: 0 | rate: / Reply Quote | |
Here is a tale of 2 computers, one that was getting units, and the other was not. At first I did. But including <fraction_done_exact/> seems to heal that fairly quickly. <app> <name>PYSCFbeta</name> <!-- Quantum chemistry calculations on GPU --> <plan_class>cuda1121</plan_class> <gpu_versions> <cpu_usage>1</cpu_usage> <gpu_usage>1</gpu_usage> </gpu_versions> <fraction_done_exact/> </app> | |
ID: 61028 | Rating: 0 | rate: / Reply Quote | |
When you send out WUs with 0.991C + 1NV BOINC does not assign a CPU core to that task. You should designate them 1C. the errors have nothing to do with the CPU resource allocation setting. they all errored because of running on GPUs that are too old, the app needs cards with at least CC of 6.0+ (Pascal and up). at worst, if someone is running the CPU full out 100% and not leaving space CPU cycles available (as they should), the worst that happens is that the GPU task might run a little more slowly. but it wont fail. I believe that the issue of "0.991" CPUs or whatever is a byproduct of the BOINC serverside software. from what I've read elsewhere, this value is not intentionally set by the researchers, it is automatically selected by the BOINC server somewhere along the way, and the researchers here have previously commented that they are not aware of any way to override this serverside. so competent users can just override it themselves if they prefer. setting your CPU use in BOINC to like 99 or 98% has the same effect overall though. ____________ | |
ID: 61029 | Rating: 0 | rate: / Reply Quote | |
Here is a tale of 2 computers, one that was getting units, and the other was not. Thanks for this information. I updated my computers. Now, I remember this <fraction_done_exact/> from a post several years ago. I can't remember the thread. In the past I didn't need this, because the tasks would correct themselves eventually, even the ATMbetas. The Quantum Chemistry on GPU does the complete opposite. I wonder if this is connected to the observation of "upwards of 30 threads utilized per task" as posted by Ian&Steve C.? | |
ID: 61032 | Rating: 0 | rate: / Reply Quote | |
nah the multi thread issue has already been fixed. the app only uses a single thread now. | |
ID: 61033 | Rating: 0 | rate: / Reply Quote | |
The work-units require a lot of GPU memory. How much is "a lot" exactly? I have a pacal card, so it meets the compute capability requirement. But it has only 2gb of VRAM. But without knowing the amount of VRAM required, I am not sure if it will work. ____________ Reno, NV Team: SETI.USA | |
ID: 61034 | Rating: 0 | rate: / Reply Quote | |
The work-units require a lot of GPU memory. It requires more than 2GB ____________ | |
ID: 61035 | Rating: 0 | rate: / Reply Quote | |
It requires more than 2GB Good to know. Thanks! ____________ Reno, NV Team: SETI.USA | |
ID: 61036 | Rating: 0 | rate: / Reply Quote | |
This is all correct I believe. It seems that the jobs have enough retry attempts that all work units end up eventually succeeding. The scheduler has some inbuilt mechanism to classify hosts as "reliable" it also has a mechanism to send workunits that have failed a few times to only hosts that are "reliable". This is not ideal of course. We will try and get the CC requirements honoured but these are project wide scheduler settings which are rather complex to fix without breaking everything else that is currently working. The download limitations is something I will not be able to change easily. A potential reason I can guess for the current settings is to stop a failing host acting as a black-hole of failed jobs or something similar. The large file download should just happen once. The app is deployed in the same way as the ATM app. It is a 2GB zip file that contains a python environment and some cuda libraries. Each work-unit only requires downloading a small file (<1MB I think). This last large scale run has been rather impressive. The throughput was very high! Especially considering that it is only on Linux hosts and not Windows. We will be sending some similar batches over the next few weeks. | |
ID: 61037 | Rating: 0 | rate: / Reply Quote | |
Hello Steeve. [quote] I would say: this is certainly for that! :D ____________ PC are like air conditioning, they becomes useless when you open Windows (L.T) In a world without walls and fences, who needs windows and gates? | |
ID: 61039 | Rating: 0 | rate: / Reply Quote | |
... Especially considering that it is only on Linux hosts and not Windows. We will be sending some similar batches over the next few weeks. Is there a plan to come up with a Windows version too? | |
ID: 61040 | Rating: 0 | rate: / Reply Quote | |
Still no work for windows 11 operating systems? | |
ID: 61041 | Rating: 0 | rate: / Reply Quote | |
Most of the work released lately has been the Quantum Chemistry tasks. The researcher said that since most educational and research labs run Linux OS', that Windows applications are a second thought. | |
ID: 61042 | Rating: 0 | rate: / Reply Quote | |
The researcher said that since most educational and research labs run Linux OS', that Windows applications are a second thought. it's really too bad that GPUGRID obviously more and more tends to exclude Windows crunchers :-( When I joined this project 8 years ago, at that time and many years thereafter, no lack of Windows tasks. On the other hand: with these few tasks available since last year, it might be the case that the number of Linux crunchers is sufficient for processing them, and the Windows crunchers from before are not needed any longer :-( At least, this is the impression one is bound to get. | |
ID: 61043 | Rating: 0 | rate: / Reply Quote | |
The lack of current Windows applications has more to do with the type of applications and API's being used currently. | |
ID: 61044 | Rating: 0 | rate: / Reply Quote | |
So - in short - bad times for Windows crunchers. Now and in the future :-( | |
ID: 61045 | Rating: 0 | rate: / Reply Quote | |
So - in short - bad times for Windows crunchers. Now and in the future :-( Pretty much so. Windows had it best back with the original release of the acemd app. Remember it was a simple, single, executable file of modest size. Derived from source code that could be compiled for Windows or for Linux. But, if you were paying attention lately, the recent acemd tasks no longer use an executable. They are using Python. The Python based tasks are NOT a single executable, they are comprised of a complete packaged python environment of many gigabytes. The nature of the tasks have changed for the project to using complex, state-of-the-art discovery calculation using cutting edge technology. The QChem tasks are even using the Tensor cores of our Nvidia cards now. This is something we asked about several years ago in the forum and were told, maybe, in the future. The future has come and our desires have been answered. But the hardware and software of our hosts now have to rise to meet those challenges. Sadly, the Windows environment is still waiting in the wings. | |
ID: 61046 | Rating: 0 | rate: / Reply Quote | |
...But including <fraction_done_exact/> seems to heal that fairly quickly. Nice advice, thank you! It resulted quickly in an accurate remaining time estimation, so I applied it to ATMbeta tasks also. | |
ID: 61082 | Rating: 0 | rate: / Reply Quote | |
Choosing not to release Windows apps is a choice they can take, obviously. And maybe their use cases warrant taking the tradeoff inherent in that. | |
ID: 61084 | Rating: 0 | rate: / Reply Quote | |
I believe that the issue of "0.991" CPUs or whatever is a byproduct of the BOINC serverside software. from what I've read elsewhere, this value is not intentionally set by the researchers, it is automatically selected by the BOINC server somewhere along the way, and the researchers here have previously commented that they are not aware of any way to override this serverside. I didn't know that. It's probably a sloppy BOINC design like using percentage to determine the number of CPU threads to use instead of integers. | |
ID: 61096 | Rating: 0 | rate: / Reply Quote | |
The work-units require a lot of GPU memory. The highest being used today on my Pascal cards is 795 MB. | |
ID: 61097 | Rating: 0 | rate: / Reply Quote | |
The work-units require a lot of GPU memory. Might want to watch that on a longer time scale, the VRAM use is not static, it fluctuates up and down ____________ | |
ID: 61098 | Rating: 0 | rate: / Reply Quote | |
Retraction: I'm monitoring with the BoincTasks Js 2.4.2.2 and it has bugs. | |
ID: 61099 | Rating: 0 | rate: / Reply Quote | |
I'm not seeing any different behavior on my titan Vs. the VRAM use still exceeds 3GB at times. but it's spikey. you have to watch it for a few mins. instantaneous measurements might not catch it. | |
ID: 61100 | Rating: 0 | rate: / Reply Quote | |
I am seeing spikes to ~7.6 GB with these. Not long lasting (in the context of the whole work unit) but consistently elevated during that part of the work unit. I want to say that I saw that spike at about 5% complete and then at 95% complete, but that also could be somewhat coincidental versus factual. | |
ID: 61101 | Rating: 0 | rate: / Reply Quote | |
I am seeing spikes to ~7.6 GB with these. Not long lasting (in the context of the whole work unit) but consistently elevated during that part of the work unit. I want to say that I saw that spike at about 5% complete and then at 95% complete, but that also could be somewhat coincidental versus factual. to add on to this, for everyone's info. these tasks (and a lot of CUDA applications in general) do not require any set absolute value of VRAM. VRAM will scale to the GPU individually. generally, the more SMs you have, to more VRAM will be used. it's not linear, but there is some portion of the allocated VRAM that scales directly with how many SMs are being used. to put it simply, different GPUs with different core counts, will have different amounts of VRAM utilization. so even if one powerful GPU like an RTX 4090 with 100+ SMs on the die might need 7+GB, doesn't mean that something much smaller like a GTX 1070 needs that much. it needs to be evaluated on a case by case basis. ____________ | |
ID: 61102 | Rating: 0 | rate: / Reply Quote | |
I am seeing spikes to ~7.6 GB with these. Not long lasting (in the context of the whole work unit) but consistently elevated during that part of the work unit. I want to say that I saw that spike at about 5% complete and then at 95% complete, but that also could be somewhat coincidental versus factual. Thanks for this! I did not know about the scaling and I don't think this is something I ever thought about (the correlation between SMs and VRAM usage). | |
ID: 61103 | Rating: 0 | rate: / Reply Quote | |
Why do I allways get segmentation fault | |
ID: 61108 | Rating: 0 | rate: / Reply Quote | |
Why do I allways get segmentation fault something wrong with your environment or drivers likely. try running a native Linux OS install, WSL might not be well supported ____________ | |
ID: 61109 | Rating: 0 | rate: / Reply Quote | |
Steve, | |
ID: 61117 | Rating: 0 | rate: / Reply Quote | |
Here's one that died on my Ubuntu system which has 32 GB RAM: | |
ID: 61118 | Rating: 0 | rate: / Reply Quote | |
i see v3 being deployed now | |
ID: 61119 | Rating: 0 | rate: / Reply Quote | |
v4 report. | |
ID: 61120 | Rating: 0 | rate: / Reply Quote | |
Yes I was doing some testing to see how large molecules we can compute properties for. | |
ID: 61121 | Rating: 0 | rate: / Reply Quote | |
no problem! glad to see you were monitoring my feedback and making changes. | |
ID: 61122 | Rating: 0 | rate: / Reply Quote | |
Yes It will be same as yesterday but roughly 10x the work units released. | |
ID: 61123 | Rating: 0 | rate: / Reply Quote | |
looking forward to it :) | |
ID: 61124 | Rating: 0 | rate: / Reply Quote | |
I have Task 33765246 running on a RTX 3060 Ti under Linux Mint 21.3 + python compute_dft.py | |
ID: 61126 | Rating: 0 | rate: / Reply Quote | |
Steve, | |
ID: 61127 | Rating: 0 | rate: / Reply Quote | |
OK. looks like the v2 tasks are back to normal. it was only that v1 task that was using lots of vram | |
ID: 61128 | Rating: 0 | rate: / Reply Quote | |
Ok my previous post was incorrect. | |
ID: 61129 | Rating: 0 | rate: / Reply Quote | |
Thanks for the info Steve. | |
ID: 61130 | Rating: 0 | rate: / Reply Quote | |
I have Task 33765246 running on a RTX 3060 Ti under Linux Mint 21.3 I'm getting several of these also. this is a problem too. you can always tell when the task basically stalls with almost no progress. ____________ | |
ID: 61131 | Rating: 0 | rate: / Reply Quote | |
My CPU fallback task has now completed and validated, in not much longer than is usual for tasks on that host. I assume it was a shortened test task, running on a slower device? I now have just completed what looks like a similar task, with similarly large jumps in progress %age, but much more quickly. Task 33765553 | |
ID: 61132 | Rating: 0 | rate: / Reply Quote | |
This is still very much a beta app. | |
ID: 61133 | Rating: 0 | rate: / Reply Quote | |
No problem Steve. I definitely understand the beta aspect of this and the need to test things. I’m just giving honest feedback from my POV. Sometimes it’s hard to tell if a radical change in behavior is intended or a sign of some problem or misconfiguration. | |
ID: 61134 | Rating: 0 | rate: / Reply Quote | |
I had an odd work unit come through (and just abandoned). I have not had any issues with these work units so thought I would mention this one specifically. | |
ID: 61135 | Rating: 0 | rate: / Reply Quote | |
I had an odd work unit come through (and just abandoned). I have not had any issues with these work units so thought I would mention this one specifically. Yeah you can see several out of memory errors. Are you running more than one at a time? I’ve had many like this. And many that seem to just fall back to CPU without any reason and get stuck for a long time. I’ve been aborting them when I notice. But it is troublesome :( ____________ | |
ID: 61136 | Rating: 0 | rate: / Reply Quote | |
I have been running 2x for these (I can't get them to run 3x or 4x via app config file but it doesn't look like there are any cued tasks waiting to start). Good to know that others have seen this too! I have seen a MASSIVE reduction in time these tasks take today. | |
ID: 61137 | Rating: 0 | rate: / Reply Quote | |
I’m now getting a 3rd type of error across all of my hosts. | |
ID: 61138 | Rating: 0 | rate: / Reply Quote | |
I've had a few of those too, mainly of the form | |
ID: 61139 | Rating: 0 | rate: / Reply Quote | |
150.000 credits for a few 100 seconds? I'm in! ;) | |
ID: 61140 | Rating: 0 | rate: / Reply Quote | |
I have Task 33765246 running on a RTX 3060 Ti under Linux Mint 21.3 I'm getting several of these also. this is a problem too. you can always tell when the task basically stalls with almost no progress. I had only those on one of my machines. Apparently it had lost sight of the GPU for crunching. Rebooting brought back the Nvidia driver to the BOINC client. Apart from this I found out that I can't run these tasks aside Private GFN Server's tasks on a six Gig GPU. So I called the PYSCFbeta tasks off for this machine, as I often have to wait for tasks to download from GPUGrid, and I don't want my GPUs to run idle. ____________ - - - - - - - - - - Greetings, Jens | |
ID: 61141 | Rating: 0 | rate: / Reply Quote | |
Did we encounter this one already? <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 14:21:03 (24335): wrapper (7.7.26016): starting 14:21:36 (24335): wrapper (7.7.26016): starting 14:21:36 (24335): wrapper: running bin/python (bin/conda-unpack) 14:21:38 (24335): bin/python exited; CPU time 0.223114 14:21:38 (24335): wrapper: running bin/tar (xjvf input.tar.bz2) 14:21:39 (24335): bin/tar exited; CPU time 0.005282 14:21:39 (24335): wrapper: running bin/bash (run.sh) + echo 'Setup environment' + source bin/activate ++ _conda_pack_activate ++ local _CONDA_SHELL_FLAVOR ++ '[' -n x ']' ++ _CONDA_SHELL_FLAVOR=bash ++ local script_dir ++ case "$_CONDA_SHELL_FLAVOR" in +++ dirname bin/activate ++ script_dir=bin +++ cd bin +++ pwd ++ local full_path_script_dir=/var/lib/boinc-client/slots/4/bin +++ dirname /var/lib/boinc-client/slots/4/bin ++ local full_path_env=/var/lib/boinc-client/slots/4 +++ basename /var/lib/boinc-client/slots/4 ++ local env_name=4 ++ '[' -n '' ']' ++ export CONDA_PREFIX=/var/lib/boinc-client/slots/4 ++ CONDA_PREFIX=/var/lib/boinc-client/slots/4 ++ export _CONDA_PACK_OLD_PS1= ++ _CONDA_PACK_OLD_PS1= ++ PATH=/var/lib/boinc-client/slots/4/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:. ++ PS1='(4) ' ++ case "$_CONDA_SHELL_FLAVOR" in ++ hash -r ++ local _script_dir=/var/lib/boinc-client/slots/4/etc/conda/activate.d ++ '[' -d /var/lib/boinc-client/slots/4/etc/conda/activate.d ']' + export PATH=/var/lib/boinc-client/slots/4:/var/lib/boinc-client/slots/4/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:. + PATH=/var/lib/boinc-client/slots/4:/var/lib/boinc-client/slots/4/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:. + echo 'Create a temporary directory' + export TMP=/var/lib/boinc-client/slots/4/tmp + TMP=/var/lib/boinc-client/slots/4/tmp + mkdir -p /var/lib/boinc-client/slots/4/tmp + export OMP_NUM_THREADS=1 + OMP_NUM_THREADS=1 + export CUDA_VISIBLE_DEVICES=0 + CUDA_VISIBLE_DEVICES=0 + export CUPY_CUDA_LIB_PATH=/var/lib/boinc-client/slots/4/cupy + CUPY_CUDA_LIB_PATH=/var/lib/boinc-client/slots/4/cupy + echo 'Running PySCF' + python compute_dft.py /var/lib/boinc-client/slots/4/lib/python3.11/site-packages/gpu4pyscf/lib/cutensor.py:174: UserWarning: using cupy as the tensor contraction engine. warnings.warn(f'using {contract_engine} as the tensor contraction engine.') /var/lib/boinc-client/slots/4/lib/python3.11/site-packages/pyscf/dft/libxc.py:771: UserWarning: Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, corresponding to the original definition by Stephens et al. (issue 1480) and the same as the B3LYP functional in Gaussian. To restore the VWN5 definition, you can put the setting "B3LYP_WITH_VWN5 = True" in pyscf_conf.py warnings.warn('Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, ' /var/lib/boinc-client/slots/4/lib/python3.11/site-packages/pyscf/gto/mole.py:1280: UserWarning: Function mol.dumps drops attribute charge because it is not JSON-serializable warnings.warn(msg) Traceback (most recent call last): File "/var/lib/boinc-client/slots/4/lib/python3.11/site-packages/cupy/cuda/compiler.py", line 253, in _jitify_prep name, options, headers, include_names = jitify.jitify(source, options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "cupy/cuda/jitify.pyx", line 63, in cupy.cuda.jitify.jitify File "cupy/cuda/jitify.pyx", line 88, in cupy.cuda.jitify.jitify RuntimeError: Runtime compilation failed During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/var/lib/boinc-client/slots/4/compute_dft.py", line 125, in <module> e,f,dip,q = compute_gpu(mol) ^^^^^^^^^^^^^^^^ File "/var/lib/boinc-client/slots/4/compute_dft.py", line 32, in compute_gpu e_dft = mf.kernel() # compute total energy ^^^^^^^^^^^ File "<string>", line 2, in kernel File "/var/lib/boinc-client/slots/4/lib/python3.11/site-packages/gpu4pyscf/scf/hf.py", line 586, in scf _kernel(self, self.conv_tol, self.conv_tol_grad, File "/var/lib/boinc-client/slots/4/lib/python3.11/site-packages/gpu4pyscf/scf/hf.py", line 393, in _kernel mf.init_workflow(dm0=dm) File "/var/lib/boinc-client/slots/4/lib/python3.11/site-packages/gpu4pyscf/df/df_jk.py", line 63, in init_workflow rks.initialize_grids(mf, mf.mol, dm0) File "/var/lib/boinc-client/slots/4/lib/python3.11/site-packages/gpu4pyscf/dft/rks.py", line 80, in initialize_grids ks.grids = prune_small_rho_grids_(ks, ks.mol, dm, ks.grids) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/boinc-client/slots/4/lib/python3.11/site-packages/gpu4pyscf/dft/rks.py", line 49, in prune_small_rho_grids_ logger.debug(grids, 'Drop grids %d', grids.weights.size - cupy.count_nonzero(idx)) ^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/boinc-client/slots/4/lib/python3.11/site-packages/cupy/_sorting/count.py", line 24, in count_nonzero return _count_nonzero(a, axis=axis) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "cupy/_core/_reduction.pyx", line 608, in cupy._core._reduction._SimpleReductionKernel.__call__ File "cupy/_core/_reduction.pyx", line 364, in cupy._core._reduction._AbstractReductionKernel._call File "cupy/_core/_cub_reduction.pyx", line 701, in cupy._core._cub_reduction._try_to_call_cub_reduction File "cupy/_core/_cub_reduction.pyx", line 538, in cupy._core._cub_reduction._launch_cub File "cupy/_core/_cub_reduction.pyx", line 473, in cupy._core._cub_reduction._cub_two_pass_launch File "cupy/_util.pyx", line 64, in cupy._util.memoize.decorator.ret File "cupy/_core/_cub_reduction.pyx", line 246, in cupy._core._cub_reduction._SimpleCubReductionKernel_get_cached_function File "cupy/_core/_cub_reduction.pyx", line 231, in cupy._core._cub_reduction._create_cub_reduction_function File "cupy/_core/core.pyx", line 2251, in cupy._core.core.compile_with_cache File "/var/lib/boinc-client/slots/4/lib/python3.11/site-packages/cupy/cuda/compiler.py", line 496, in _compile_module_with_cache return _compile_with_cache_cuda( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/boinc-client/slots/4/lib/python3.11/site-packages/cupy/cuda/compiler.py", line 574, in _compile_with_cache_cuda ptx, mapping = compile_using_nvrtc( ^^^^^^^^^^^^^^^^^^^^ File "/var/lib/boinc-client/slots/4/lib/python3.11/site-packages/cupy/cuda/compiler.py", line 322, in compile_using_nvrtc return _compile(source, options, cu_path, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/boinc-client/slots/4/lib/python3.11/site-packages/cupy/cuda/compiler.py", line 287, in _compile options, headers, include_names = _jitify_prep( ^^^^^^^^^^^^^ File "/var/lib/boinc-client/slots/4/lib/python3.11/site-packages/cupy/cuda/compiler.py", line 260, in _jitify_prep raise JitifyException(str(cex)) cupy.cuda.compiler.JitifyException: Runtime compilation failed 14:23:34 (24335): bin/bash exited; CPU time 14.043607 14:23:34 (24335): app exit status: 0x1 14:23:34 (24335): called boinc_finish(195) </stderr_txt> ]]> ____________ - - - - - - - - - - Greetings, Jens | |
ID: 61145 | Rating: 0 | rate: / Reply Quote | |
that looks like a driver issue. | |
ID: 61146 | Rating: 0 | rate: / Reply Quote | |
The present batch has a far worse failure ratio than the previous one. | |
ID: 61147 | Rating: 0 | rate: / Reply Quote | |
that looks like a driver issue. This is 100% correct. Our system with 2x RTX a6000 (48GB of VRAM) has had 500 valid results and no errors. They are running tasks at 2x and they seem to run really well (https://www.gpugrid.net/results.php?hostid=616410). In one of our systems with 3x RTX a4500 GPUs (20GB), as soon as I changed running 2x of these tasks to 1x, the error rate greatly improved (https://www.gpugrid.net/results.php?hostid=616409). I made the change and have had 14 tasks in a row without errors. When I am back in the classroom I think I will be changing anything equal to, or less than, 24GB to only run one task in order to improve the valid rate. Has any tried running MPS with these tasks, and would it would make a difference in the allocation of resources to successfully run 2X? Just curious about thoughts. | |
ID: 61149 | Rating: 0 | rate: / Reply Quote | |
Last week, I had a 100% success rate. This week, it's a different story. Maybe, it's time to step back and dial it down a bit. You have to work with the resources that you have, not the ones that you wish you had. | |
ID: 61150 | Rating: 0 | rate: / Reply Quote | |
Boca, | |
ID: 61151 | Rating: 0 | rate: / Reply Quote | |
that looks like a driver issue. That's what Pascal (?) wrote in the Q&A as well. Had three tasks on that host, and two of them failed. ____________ - - - - - - - - - - Greetings, Jens | |
ID: 61152 | Rating: 0 | rate: / Reply Quote | |
tout le monde ne dipose de carte graphique a 5000 euros avec 24 gigas de vram voir plus.vous devriez penser au plus modeste d'entre nous. | |
ID: 61153 | Rating: 0 | rate: / Reply Quote | |
I've disabled getting new GPUGrid tasks GPUGrid on my host with "small" amount (below 24GB) of GPU memory. idle: 305 MiB
task starting: 895 MiB
GPU usage rises: 6115 MiB
GPU usage drops: 7105 MiB
GPU usage 100%: 7205 MiB
GPU usage drops: 8495 MiB
GPU usage rises: 9961 MiB
GPU usage drops: 14327 MiB (it would have failed on my GTX 1080 Ti at this point)
GPU usage rises: 6323 MiB
GPU usage drops: 15945 MiB
GPU usage 100%: 6205 MiB
...and so on So the memory usage doubles at some points of processing for a short while, and this cause the workunits to fail on GPUs that have "small" amount of memory. If this behaviour could be eliminated, much more hosts could process these workunits. | |
ID: 61154 | Rating: 0 | rate: / Reply Quote | |
Nothing to do at this time for my currently working GPUs with PYSCFbeta tasks. | |
ID: 61155 | Rating: 0 | rate: / Reply Quote | |
I agree it does seem these tasks have a spike in memory usage. I "rented" an RTX A5000 GPU which also has 24 GB memory, and running 1 task at a time, at least the first task completed: | |
ID: 61156 | Rating: 0 | rate: / Reply Quote | |
Exactly the same here. After 29 consecutive errors on a RTX4070Ti, I have disabled 'Quantum chemistry on GPU (beta)'. | |
ID: 61157 | Rating: 0 | rate: / Reply Quote | |
I have one machine still taking on GPUGrid tasks. | |
ID: 61158 | Rating: 0 | rate: / Reply Quote | |
bonjour | |
ID: 61159 | Rating: 0 | rate: / Reply Quote | |
Boca, This was wild... For a single work unit: Hovers around 3-4GB Rises to 8-9GB Spikes to ~11GB regularly. Highest Spike (seen): 12.5GB Highest Spike (estimated based on psensor): ~20GB. Additionally, Psensor caught a highest memory usage spike of 76% of the 48GB of the RTX A6000 for one work unit but I did not see when this happened or if it happened at all. I graphically captured the VRAM memory usage for one work unit. I have no idea how to imbed images here. So, here is a Google Doc: https://docs.google.com/document/d/1xpOpNJ93finciJQW7U07dMHOycSVlbYq9G6h0Xg7GtA/edit?usp=sharing EDIT: I think they just purged these work units from the server? | |
ID: 61160 | Rating: 0 | rate: / Reply Quote | |
thanks. that's kind of what I expected was happening. | |
ID: 61161 | Rating: 0 | rate: / Reply Quote | |
New batch just come through- seeing the same VRAM spikes and patterns. | |
ID: 61162 | Rating: 0 | rate: / Reply Quote | |
I'm seeing the same spikes, but so far so good. biggest spike i saw was ~9GB | |
ID: 61163 | Rating: 0 | rate: / Reply Quote | |
Hi. I have been tweaking settings. All WUs I have tried now work on my 1080(8GB). | |
ID: 61164 | Rating: 0 | rate: / Reply Quote | |
seeing some errors on Titan V (12GB). not a huge amount. but certainly a noteworthy amount. maybe you can correlate these specific WUs and see why these kind (number of atoms or molecules?) might be requesting more VRAM than the ones you tried on your 1080. | |
ID: 61165 | Rating: 0 | rate: / Reply Quote | |
Still seeing a vram spike above 8GB | |
ID: 61166 | Rating: 0 | rate: / Reply Quote | |
Agreed- it seems that there are fewer spikes and most of them are in the 8-9GB range. A few higher but it seems less frequent? Difficult to quantify an actual difference since the work units can be so different. Is there a difference in VRAM usage or does the actual work unit just happen to need less VRAM? | |
ID: 61167 | Rating: 0 | rate: / Reply Quote | |
Seems like credit has gone down from 150K to 15K. | |
ID: 61168 | Rating: 0 | rate: / Reply Quote | |
Occasionally 8G of vram card is not sufficient. Still seeing error on these cards. | |
ID: 61169 | Rating: 0 | rate: / Reply Quote | |
Even that 16GB GPU had one failure with the new v3 batch | |
ID: 61171 | Rating: 0 | rate: / Reply Quote | |
Even that 16GB GPU had one failure with the new v3 batch Based on the times of tasks, it looks like those were running at 1x? | |
ID: 61172 | Rating: 0 | rate: / Reply Quote | |
bonsoir chez moi ça marche bien maintenant. | |
ID: 61173 | Rating: 0 | rate: / Reply Quote | |
14 tasks of the latest batch completed successfully without any error. Seems like credit has gone down from 150K to 15K. Perhaps 150k was a little too generous. But 15k is not on par with other GPU projects. I expect there will be fairer credits again soon - with the next batch? | |
ID: 61174 | Rating: 0 | rate: / Reply Quote | |
14 tasks of the latest batch completed successfully without any error. Are you running them at 1x and with how much VRAM? Trying to get a feel for what the actual "cutoff" is for these tasks right now. I am still feeling 24GB VRAM is needed for the success running 1x and double that for 2x. | |
ID: 61175 | Rating: 0 | rate: / Reply Quote | |
sometimes more than 12GB as about 4% (16 out of 372) of my tasks failed all on GPUs with 12GB, all running at 1x only for the v3 batch. not sure how much VRAM is needed to be 100% successful. I did have one success that was a resend of one of your errors from a 4090 24GB. so i'm guessing you were running that one at 2x and got unlucky with two big tasks at the same time. | |
ID: 61176 | Rating: 0 | rate: / Reply Quote | |
sometimes more than 12GB as about 4% (16 out of 372) of my tasks failed all on GPUs with 12GB, all running at 1x only for the v3 batch. not sure how much VRAM is needed to be 100% successful. I did have one success that was a resend of one of your errors from a 4090 24GB. so i'm guessing you were running that one at 2x and got unlucky with two big tasks at the same time. Correct- I was playing around with the two 4090 systems running these to make some comparisons. And you are also correct- it seems that even with 24GB, running 2x is still not really ideal. Those random, huge spikes seem to find each other when running 2x. | |
ID: 61177 | Rating: 0 | rate: / Reply Quote | |
Are you running them at 1x and with how much VRAM? Trying to get a feel for what the actual "cutoff" is for these tasks right now. I am still feeling 24GB VRAM is needed for the success running 1x and double that for 2x. The GPU is an MSI 4070 Ti GAMING X SLIM with 12GB GDDR6X, run at 1x. Obviously sufficient for the latest batch to run flawlessly. | |
ID: 61178 | Rating: 0 | rate: / Reply Quote | |
14 tasks of the latest batch completed successfully without any error. Assuming someone with a 3080Ti card, it will be better to run ATMbeta task first and then Quantum chemistry (if former has no available tasks) if credits granted is an important factor for some crunchers. For me, I've 3080Ti and P100, so I will likely run ATMbeta on 3080Ti and Quantum chemistry on P100, if both tasks are available. | |
ID: 61179 | Rating: 0 | rate: / Reply Quote | |
Are you running them at 1x and with how much VRAM? Trying to get a feel for what the actual "cutoff" is for these tasks right now. I am still feeling 24GB VRAM is needed for the success running 1x and double that for 2x. Thanks for the info. If you don't mind me asking- how many ran (in a row) without any errors? | |
ID: 61180 | Rating: 0 | rate: / Reply Quote | |
I have got a rig with 9 pieces of P106, which are slightly modified GTX1060 6GB used for Ethereum mining back in the day. I can run only two GPUgrid tasks at once (main CPU is only a dual core Celeron) but so far I have had one error and several tasks finish and validate. Hoping for good results for the rest! | |
ID: 61181 | Rating: 0 | rate: / Reply Quote | |
Are you running them at 1x and with how much VRAM? Trying to get a feel for what the actual "cutoff" is for these tasks right now. I am still feeling 24GB VRAM is needed for the success running 1x and double that for 2x. 14 consecutive tasks without any error. | |
ID: 61182 | Rating: 0 | rate: / Reply Quote | |
I have got a rig with 9 pieces of P106, which are slightly modified GTX1060 6GB used for Ethereum mining back in the day. I can run only two GPUgrid tasks at once (main CPU is only a dual core Celeron) but so far I have had one error and several tasks finish and validate. Hoping for good results for the rest! So managed to get 11 tasks, from which 9 passed and validated and 2 failed some time into the process. | |
ID: 61183 | Rating: 0 | rate: / Reply Quote | |
...From our end we will need to see how to assign WU's based on GPU memory. (Previous apps have been compute bound rather than GPU memory bound and have only been assigned based on driver version) Probably (I don't know if it is viable), a better solution would be to include some portion in the code to limit peak VRAM according to the true device assigned. The reason, based in an example: My Host #482132 is shown by BOINC as [2] NVIDIA NVIDIA GeForce GTX 1660 Ti (5928MB) driver: 550.40 This is true for Device 0, NVIDIA NVIDIA GeForce GTX 1660 Ti (5928MB) driver: 550.40 But Device 1 completing this host, should be shown as NVIDIA NVIDIA GeForce GTX 1650 SUPER (3895MB) driver: 550.40 Tasks sent according to Device 0 VRAM (6 GB), would likely run out of memory when striking Device 1 (4 GB VRAM) | |
ID: 61184 | Rating: 0 | rate: / Reply Quote | |
...From our end we will need to see how to assign WU's based on GPU memory. (Previous apps have been compute bound rather than GPU memory bound and have only been assigned based on driver version) the only caveat with this is that the application or project doesnt have any ability to select which GPUs you have or which GPU will run the task. in your example, if a task was sent that required >4GB, the project has no idea that GPU1 only has 4GB. the project can only see the "first/best" GPU in the system, that is communicated via your boinc client, and the boinc client is the one that selects which tasks go to which GPU. the science application is called after the GPU selection has already been made. and similarly, BOINC has no mechanism to assign tasks based on GPU VRAM use. you will have to manage things yourself after observing behavior. if you notice one GPU consistently has too little VRAM, you can exclude that GPU from running the QChem project via setting the <exclude_gpu> statement in the cc_config.xml file. <options> <exclude_gpu> <url>https://www.gpugrid.net/</url> <app>PYSCFbeta</app> <device_num>1</device_num> </exclude_gpu> </options> ____________ | |
ID: 61185 | Rating: 0 | rate: / Reply Quote | |
you will have to manage things yourself after observing behavior. Certainly. Your advice is always very appreciated. Would be fine an update of minimum requirements when PYSCF taks arrive to production stage, as a help for excluding non accomplishing hosts / GPUs. | |
ID: 61186 | Rating: 0 | rate: / Reply Quote | |
you will have to manage things yourself after observing behavior. I would imagine something like what WCG posted may be useful showing system requirements such as memory, disk space, one-time download file size, etc https://www.worldcommunitygrid.org/help/topic.s?shortName=minimumreq. Other than WCG not running smoothly since the IBM migration, I notice that the WCG system requirements are outdated. I guess it takes effort to maintain such information and keeping it up to date. So far, this is my limited knowledge about the quantum chemistry task as I'm still learning. Anyone is welcome to chime in for the system requirements. 1) One time download file is about 2GB. Be prepare to wait for hours if you have very slow internet speed. 2) The more gpu vram the better. Seems like 24GB cards or more perform the best. 3) GPUs with faster memory bandwidth and faster FP64 have advantage in shorter run time. Typically this is found in datacenter/server/workstation cards. | |
ID: 61187 | Rating: 0 | rate: / Reply Quote | |
Implementing a possibility to choose work with certain demands to the hardware through the preferences would be nice as well. | |
ID: 61188 | Rating: 0 | rate: / Reply Quote | |
Ok so it seems like things are improved with the latest settings. After lots of problems with the ECM subproject claiming too much system memory yoyo@home divided the subproject into smaller and bigger tasks, which can both be ticked (or be left unticked) in the project preferences. Might be the most workable solution for the future once the current batch of work is done. The memory use is mainly determined by the size of molecule and number of heavy elememts. So before WUs are sent out we can make a rough estimate of the memory use. There is an elemnt of randomness that comes from high memory use for specific physical configurations that are harder to converge. We cannot estimate this before sending and it will only happen during the calculation. | |
ID: 61189 | Rating: 0 | rate: / Reply Quote | |
Seems like credit has gone down from 150K to 15K? | |
ID: 61190 | Rating: 0 | rate: / Reply Quote | |
Seems like credit has gone down from 150K to 15K? Yes, and the memory use this morning seems to require running 1 at a time on GPUs with less than 16 GB, which hurts performance even more. Steve, what determines point value for a task? | |
ID: 61191 | Rating: 0 | rate: / Reply Quote | |
Pour le moment ça va pas trop mal avec les nouvelles unités de calcul. | |
ID: 61192 | Rating: 0 | rate: / Reply Quote | |
I'm seeing about 10% failure rate with 12GB cards. | |
ID: 61193 | Rating: 0 | rate: / Reply Quote | |
Credits should now be at 75k for rest of the batch. They should be consistent based on comparisons of runtime on our test machines across the other Apps, but this is complicated with this new memory intensive app. I will investigate before sending the next batch. | |
ID: 61194 | Rating: 0 | rate: / Reply Quote | |
There are some tasks that spike over 10G. Seems like nvidia-smi doesn't allow logging time shorter than 1s. Anyone has a workaround? Likely that the momentarily spike could be higher than 10G as recorded. | |
ID: 61195 | Rating: 0 | rate: / Reply Quote | |
Got a biggie. This one is 14.6G. I'm running 16G card. One task per gpu. | |
ID: 61196 | Rating: 0 | rate: / Reply Quote | |
yeah i think you'll only ever see the spike if you actually have the VRAM for it. if you don't have enough, it will error out before hitting it and you'll never see it. | |
ID: 61197 | Rating: 0 | rate: / Reply Quote | |
pututu, have you had any failed tasks? Ian&Steve C. reports ~10% failure rate with 12GB so I am curious about 16GB. I am guessing this is about the minimum for error-free (related to memory limitations) processing of the current work. | |
ID: 61198 | Rating: 0 | rate: / Reply Quote | |
We did this as well this morning for the 4090 GPUs since they have 24GB but with E@H work. To little VRAM to run QChem at 2x but too much compute power left on the table for running them at 1x. | |
ID: 61199 | Rating: 0 | rate: / Reply Quote | |
pututu, have you had any failed tasks? Ian&Steve C. reports ~10% failure rate with 12GB so I am curious about 16GB. I am guessing this is about the minimum for error-free (related to memory limitations) processing of the current work. 0 failure after 19 completed tasks on one P100 with 16G. So far 14.6G is the highest I've seen with 1 sec interval monitoring More than half of the tasks processed momentarily hit 8G or more. Didn't record any actual data, just watching the nvidia-smi from time to time. Edit: another task with more than 12G but with ominous 6666M, lol 2024/02/05 09:17:58.869, 99 %, 1328 MHz, 10712 MiB, 131.69 W, 70 2024/02/05 09:17:59.872, 100 %, 1328 MHz, 10712 MiB, 101.87 W, 70 2024/02/05 09:18:00.877, 100 %, 1328 MHz, 10700 MiB, 50.15 W, 69 2024/02/05 09:18:01.880, 92 %, 1240 MHz, 11790 MiB, 54.34 W, 69 2024/02/05 09:18:02.883, 95 %, 1240 MHz, 12364 MiB, 53.20 W, 69 2024/02/05 09:18:03.886, 83 %, 1126 MHz, 6666 MiB, 137.77 W, 70 2024/02/05 09:18:04.889, 100 %, 1075 MHz, 6666 MiB, 130.53 W, 71 2024/02/05 09:18:05.892, 92 %, 1164 MHz, 6666 MiB, 129.84 W, 71 2024/02/05 09:18:06.902, 100 %, 1063 MHz, 6666 MiB, 129.82 W, 71 | |
ID: 61200 | Rating: 0 | rate: / Reply Quote | |
pututu, have you had any failed tasks? Ian&Steve C. reports ~10% failure rate with 12GB so I am curious about 16GB. I am guessing this is about the minimum for error-free (related to memory limitations) processing of the current work. been running all day across my 18x Titan Vs. the effective error rate is right around 5%. so 5% of the tasks needed more than 12GB. running only 1 task per GPU. i rented an A100 40GB for the day. running 3x on this GPU with MPS set to 40%, it's done about 300 tasks and only 1 task failed from out of memory. highest spike i saw was 39GB, but usually stays around 20GB utilized ____________ | |
ID: 61201 | Rating: 0 | rate: / Reply Quote | |
pututu, have you had any failed tasks? Ian&Steve C. reports ~10% failure rate with 12GB so I am curious about 16GB. I am guessing this is about the minimum for error-free (related to memory limitations) processing of the current work. Wow, the A100 is powerful. I can't believe how fast it can chew through these (well, I can believe it, but it's still amazing). I am somewhat new to MPS and I understand the general concept, but what do you mean when you say it is set to 40%? | |
ID: 61202 | Rating: 0 | rate: / Reply Quote | |
eh bien moi j'ai abandonné trop d'erreurs. | |
ID: 61203 | Rating: 0 | rate: / Reply Quote | |
I am somewhat new to MPS and I understand the general concept, but what do you mean when you say it is set to 40%? CUDA MPS has a setting called active thread percentage. It basically limits how many SMs of the GPU get used for each process. Without MPS, each process will call for all available SMs all the time, in separate contexts (MPS also shares a single context). I set that to 40%, so each task is only using 40% of the available SMs. With 3x running that’s slightly over provisioning the GPU, but it usually works well and runs faster than 3x without MPS. It also has the benefit of reducing VRAM use most of the time, but it doesn’t seem to limit these tasks much. The only caveat is that when you run low on work, the remaining one or two tasks won’t use all the GPU, instead using only the 40% and none of the rest of the idle GPU. ____________ | |
ID: 61204 | Rating: 0 | rate: / Reply Quote | |
Seems like nvidia-smi doesn't allow logging time shorter than 1s. Anyone has a workaround? Have you tried NVITOP? https://github.com/XuehaiPan/nvitop | |
ID: 61205 | Rating: 0 | rate: / Reply Quote | |
Seems like nvidia-smi doesn't allow logging time shorter than 1s. Anyone has a workaround? No. A quick search seems to indicate that it uses nvidia-smi command, so likely to have similar limitation. Anyway after a day or running (>100+ tasks) I didn't see any failures on the 16GB card, so I'm good, at least for now. | |
ID: 61206 | Rating: 0 | rate: / Reply Quote | |
I am somewhat new to MPS and I understand the general concept, but what do you mean when you say it is set to 40%? Thank you for the explanation! | |
ID: 61207 | Rating: 0 | rate: / Reply Quote | |
bonsoir , | |
ID: 61208 | Rating: 0 | rate: / Reply Quote | |
bonsoir , Only Linux still. ____________ | |
ID: 61210 | Rating: 0 | rate: / Reply Quote | |
Good evening, Only Linux still. :-( :-( :-( | |
ID: 61212 | Rating: 0 | rate: / Reply Quote | |
je viens de repasser sous linux et c'est reparti.bye bye windows 10. | |
ID: 61213 | Rating: 0 | rate: / Reply Quote | |
We have definitely noticed a sharp decrease in "errors" with these tasks. Steve (or anyone), can you offer some insight into the filenames? As example: | |
ID: 61214 | Rating: 0 | rate: / Reply Quote | |
“0-1”notation with all GPUGRID tasks seems to indicate the segment you are on and how many total segments there are | |
ID: 61215 | Rating: 0 | rate: / Reply Quote | |
Looks like they transitioned from v3-0-1 on Feb 2 to a test result on Feb 3 and then started the v4-0-1 run on Feb 5 | |
ID: 61216 | Rating: 0 | rate: / Reply Quote | |
Why do I allways get segmentation fault I'm getting the same issues running throug WSL2, immediate segmentation fault. https://www.gpugrid.net/result.php?resultid=33853832 https://www.gpugrid.net/result.php?resultid=33853734 Environment & drivers should be OK, since it is running other project's GPU tasks just fine! Unless gpugrid has some specific prerequisites? Working project tasks: https://moowrap.net/result.php?resultid=201144661 Installing a native Linux OS is simply not an option for most regular users that don't have dedicated compute farms... | |
ID: 61217 | Rating: 0 | rate: / Reply Quote | |
then I guess you'll just have to wait for the native Windows app. it seems apparent that something doesnt work with these tasks under WSL. so indeed some kind of problem or incompatibility related to WSL. the fact that some other app works isnt really relevant. a key difference is probably in the difference in how these apps are distributed. Moo wrapper uses a compiled binary and the QChem work is supplied via an entire python environment designed to work with a native linux install (it does a lot of things for setting up things like environment variables which might not be correct for WSL as an example). these tasks also use CuPy, which might not be well supported for WSL, or the way cupy is being called isnt right for WSL. either way, don't think there's gonna be a solution for use with WSL. switch to Linux, or wait for the Windows version. | |
ID: 61218 | Rating: 0 | rate: / Reply Quote | |
hello | |
ID: 61219 | Rating: 0 | rate: / Reply Quote | |
hello That's hardly surprising given this stat: https://www.boincstats.com/stats/45/host/breakdown/os/ 2500+ Windows hosts 688 Linux hosts Yet Windows hosts are not getting any work, so are not given opportunity to contribute to research or to beta testing even if they're prepared to go the extra mile with getting experimental apps to work. So logical that people start leaving - certainly the set-it-and-forget-it crowd. | |
ID: 61220 | Rating: 0 | rate: / Reply Quote | |
Yet Windows hosts are not getting any work, so are not given opportunity to contribute to research or to beta testing even if they're prepared to go the extra mile with getting experimental apps to work. When I joined GPUGRID about 9 years ago, all subprojects were available for Linux and Windows as well. At that time and even several years later, my hosts were working for GPUGRID almost 365 days/year. Somehow, it makes me sad that I am less and less able to contribute to this valuable project. Recently, someone here explained the reason: scientific projects are primarily done by Linux, not by Windows. Why so, all of a sudden ??? | |
ID: 61221 | Rating: 0 | rate: / Reply Quote | |
then I guess you'll just have to wait for the native Windows app. it seems apparent that something doesnt work with these tasks under WSL. so indeed some kind of problem or incompatibility related to WSL. the fact that some other app works isnt really relevant. a key difference is probably in the difference in how these apps are distributed. Moo wrapper uses a compiled binary and the QChem work is supplied via an entire python environment designed to work with a native linux install (it does a lot of things for setting up things like environment variables which might not be correct for WSL as an example). these tasks also use CuPy, which might not be well supported for WSL, or the way cupy is being called isnt right for WSL. either way, don't think there's gonna be a solution for use with WSL. switch to Linux, or wait for the Windows version. It could be that, yes. But it could also be memory overflow. Running a gtx1080ti with 11GB vram Running from the commandline with nvidia-smi logging I see memory going up to 8GB allocated, then a segmentation fault - which could be caused by a block allocating over the 11GB limit? monitoring output: # gpu pwr gtemp mtemp sm mem enc dec jpg ofa mclk pclk pviol tviol fb bar1 ccpm sbecc dbecc pci rxpci txpci # Idx W C C % % % % % % MHz MHz % bool MB MB MB errs errs errs MB/s MB/s 0 15 30 - 2 8 0 0 - - 405 607 0 0 1915 2 - - - 0 0 0 0 17 30 - 2 8 0 0 - - 405 607 0 0 1915 2 - - - 0 0 0 0 74 33 - 2 1 0 0 - - 5005 1569 0 0 2179 2 - - - 0 0 0 0 133 39 - 77 5 0 0 - - 5005 1987 0 0 4797 2 - - - 0 0 0 0 167 49 - 63 16 0 0 - - 5005 1974 0 0 6393 2 - - - 0 0 0 0 119 54 - 74 4 0 0 - - 5005 1974 0 0 8329 2 - - - 0 0 0 0 87 47 - 0 0 0 0 - - 5508 1974 0 0 1915 2 - - - 0 0 0 commandline run output: /var/lib/boinc/projects/www.gpugrid.net/bck/lib/python3.11/site-packages/gpu4pyscf/lib/cutensor.py:174: UserWarning: using cupy as the tensor contraction engine. warnings.warn(f'using {contract_engine} as the tensor contraction engine.') /var/lib/boinc/projects/www.gpugrid.net/bck/lib/python3.11/site-packages/pyscf/dft/libxc.py:771: UserWarning: Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, corresponding to the original definition by Stephens et al. (issue 1480) and the same as the B3LYP functional in Gaussian. To restore the VWN5 definition, you can put the setting "B3LYP_WITH_VWN5 = True" in pyscf_conf.py warnings.warn('Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, ' nao = 590 reading molecules in current dir mol_130305284_conf_0.xyz mol_130305284_conf_1.xyz mol_130305284_conf_2.xyz mol_130305284_conf_3.xyz mol_130305284_conf_4.xyz mol_130305284_conf_5.xyz mol_130305284_conf_6.xyz mol_130305284_conf_7.xyz mol_130305284_conf_8.xyz mol_130305284_conf_9.xyz ['mol_130305284_conf_0.xyz', 'mol_130305284_conf_1.xyz', 'mol_130305284_conf_2.xyz', 'mol_130305284_conf_3.xyz', 'mol_130305284_conf_4.xyz', 'mol_130305284_conf_5.xyz', 'mol_130305284_conf_6.xyz', 'mol_130305284_conf_7.xyz', 'mol_130305284_conf_8.xyz', 'mol_130305284_conf_9.xyz'] Computing energy and forces for molecule 1 of 10 charge = 0 Structure: ('I', [-9.750986802755719, 0.9391938839088357, 0.1768783652592898]) ('C', [-5.895945508642993, 0.12453295160883758, 0.05083363275080016]) ('C', [-4.856596140132209, -2.2109795657411224, -0.2513335745671532]) ('C', [-2.2109795657411224, -2.0220069532846163, -0.24377467006889297]) ('O', [-0.304245906054975, -3.7227604653931716, -0.46865207889213534]) ('C', [1.8519316020737606, -2.3621576557063273, -0.3080253583041051]) ('C', [4.440856392727896, -2.9668700155671472, -0.4006219384077931]) ('C', [5.839253724906041, -0.8163616858121067, -0.1379500070932495]) ('I', [9.769884064001369, -0.6368377039784259, -0.13889487015553204]) ('S', [4.100705690306184, 1.9464179083020137, 0.22298768269867728]) ('C', [1.3587130835622794, 0.22298768269867728, 0.02022006953284616]) ('C', [-1.2925726692025024, 0.43463700864996424, 0.06254993472310354]) ('S', [-3.7227604653931716, 2.5700275294084842, 0.3477096069199714]) ('H', [-5.914842769888644, -3.9306303390953286, -0.46298290051844015]) ('H', [5.19674684255392, -4.818801617640907, -0.640617156227556]) ******** <class 'gpu4pyscf.df.df_jk.DFRKS'> ******** method = DFRKS initial guess = minao damping factor = 0 level_shift factor = 0 DIIS = <class 'gpu4pyscf.scf.diis.CDIIS'> diis_start_cycle = 1 diis_space = 8 SCF conv_tol = 1e-09 SCF conv_tol_grad = None SCF max_cycles = 50 direct_scf = False chkfile to save SCF result = /var/lib/boinc/projects/www.gpugrid.net/bck/tmp/tmpd03fogee max_memory 4000 MB (current use 345 MB) XC library pyscf.dft.libxc version 6.2.2 unable to decode the reference due to https://github.com/NVIDIA/cuda-python/issues/29 XC functionals = wB97M-V N. Mardirossian and M. Head-Gordon., J. Chem. Phys. 144, 214110 (2016) radial grids: Treutler-Ahlrichs [JCP 102, 346 (1995); DOI:10.1063/1.469408] (M4) radial grids becke partition: Becke, JCP 88, 2547 (1988); DOI:10.1063/1.454033 pruning grids: <function nwchem_prune at 0x7f29529356c0> grids dens level: 3 symmetrized grids: False atomic radii adjust function: <function treutler_atomic_radii_adjust at 0x7f2952935580> ** Following is NLC and NLC Grids ** NLC functional = wB97M-V radial grids: Treutler-Ahlrichs [JCP 102, 346 (1995); DOI:10.1063/1.469408] (M4) radial grids becke partition: Becke, JCP 88, 2547 (1988); DOI:10.1063/1.454033 pruning grids: <function nwchem_prune at 0x7f29529356c0> grids dens level: 3 symmetrized grids: False atomic radii adjust function: <function treutler_atomic_radii_adjust at 0x7f2952935580> small_rho_cutoff = 1e-07 Set gradient conv threshold to 3.16228e-05 Initial guess from minao. Default auxbasis def2-tzvpp-jkfit is used for H def2-tzvppd Default auxbasis def2-tzvpp-jkfit is used for C def2-tzvppd Default auxbasis def2-tzvpp-jkfit is used for S def2-tzvppd Default auxbasis def2-tzvpp-jkfit is used for O def2-tzvppd Default auxbasis def2-tzvpp-jkfit is used for I def2-tzvppd /var/lib/boinc/projects/www.gpugrid.net/bck/lib/python3.11/site-packages/pyscf/gto/mole.py:1280: UserWarning: Function mol.dumps drops attribute charge because it is not JSON-serializable warnings.warn(msg) tot grids = 225920 tot grids = 225920 segmentation fault | |
ID: 61222 | Rating: 0 | rate: / Reply Quote | |
First, it’s well known at this point that these tasks require a lot of VRAM. So some failures are to be expected from that. The VRAM utilization is not constant, but spikes up and down. From the tasks running on my systems, loading up to 5-6GB and staying around that amount is pretty normal, with intermittent spikes to the 9-12GB+ range occasionally. Just by looking at the failure rate of different GPUs, I’m estimating that most tasks need more than 8GB (>70%), a small amount of tasks need more than 12GB (~5%), and a very small number of them need even more than 16GB (<1%). A teammate of mine is running on a couple 2080Tis (11GB) and has had some failures but mostly success. | |
ID: 61223 | Rating: 0 | rate: / Reply Quote | |
Yet Windows hosts are not getting any work, so are not given opportunity to contribute to research or to beta testing even if they're prepared to go the extra mile with getting experimental apps to work. I posed this question to Google and their AI engine came up with this response "how long has most scientific research projects used linux compared to windows" Linux is a popular choice for research companies because it offers flexibility, security, stability, and cost-effectiveness. Linux is also used in technical disciplines at universities and research centers because it's free and includes a large amount of free and open-source software. | |
ID: 61224 | Rating: 0 | rate: / Reply Quote | |
une chose est sure il n'y aura pas assez d'utilisateurs pour tout calculer.il y à 50462 taches pour 106 ordinateurs au moment ou j'écris ces lignes.Elles arrivent plus vite que les taches qui sont calculées.je pense que gpugrid va droit dans le mur s'il ne font rien. | |
ID: 61225 | Rating: 0 | rate: / Reply Quote | |
we are processing about 12,000 tasks per day, so there's a little more than 4 days worth of work right now, but the available work is still climbing | |
ID: 61226 | Rating: 0 | rate: / Reply Quote | |
The choice for Linux as a research OS in academic context is clear, but has really no relation with the choice for which platforms to support as a BOINC project. BOINC as a platform was always a 'supercomputer for peanuts' proposition - you invest a fraction of what a real supercomputer costs but can get similar processing power, which is exactly what many low-budget academic research groups were looking for. Part of that investment is the choice of which platforms to support, and it is primarily driven by the amount of processing power needed, with the correlation to your native development OS only a secondary consideration. As I said already in my previous post it all depends what type of project you want to be 1) You need all the power and/or turnaround you can get? Support all the platforms you can handle, with Windows your #1 priority because that's where the majority of the FLOPS are. 2) You don't really need that much power, your focus is more on developing/researching algorithms? Stay native OS 3) You need some of both? Prio on native OS for your beta apps, but keep driving that steady stream of stable work on #1 Windows and #2 Linux to keep the interest of your supercomputer 'providers' engaged. Because that's the last part of the 'small investment' needed for your FLOPS: keeping your users happy and engaged. So I see no issue at all with new beta's being on Linux first, but am also concerned or sad that there is only beta/Linux work lately as opposed to the earlier days of gpugrid. Unless of course the decision is made to go full-on as a type 2) project? | |
ID: 61227 | Rating: 0 | rate: / Reply Quote | |
there has been a bunch of ATM work intermittently, which works on Windows. they had to fix both Windows and Linux versions of their application at different times so there were times when Linux worked and Windows didn't, and times where Windows worked and Linux didnt. the most recent batch i believe both applications were working. this is still classified as "beta" for both Linux and Windows. | |
ID: 61228 | Rating: 0 | rate: / Reply Quote | |
The researcher earlier stated there were NO Windows computers in the lab. | |
ID: 61229 | Rating: 0 | rate: / Reply Quote | |
I'm using Docker for Windows, which is using WSL2 as backend, and I'm having the same problems. So another hint at WSL being the problem. Other Projects that use my NVidia card work fine though. | |
ID: 61230 | Rating: 0 | rate: / Reply Quote | |
There is an obvious solution, that no one has mentioned, for Windows users who wish to contribute to this project, and at the risk of starting a proverbial firestorm, I will mentioned it. You could install Linux on your machine(s). I did it last year. It has work out fine for me. | |
ID: 61231 | Rating: 0 | rate: / Reply Quote | |
Sure that's an option, and no need to fear a firestorm. At least not from me, I've worked with Linux or other Unix flavors a lot over the years, both professionally and personally. And besides, I hate forum flame-wars or any kind of tech solution holy wars. ;-) | |
ID: 61232 | Rating: 0 | rate: / Reply Quote | |
why not run Linux as prime, and then virtualize (or maybe even WINE) your windows only software? | |
ID: 61233 | Rating: 0 | rate: / Reply Quote | |
why not run Linux as prime, and then virtualize (or maybe even WINE) your windows only software? Because I need Windows all the time, whereas in the last 15 years, this is the only time I couldn't get something to work through a virtual Linux. And BOINC is just a hobby after all... Would you switch prime OS in such a case? On another note - DCF is going crazy again. Average runtimes are consistent around 30 minutes, yet DCF is going up like crazy - estimated runtime of new WU's now at 76 days! On a positive note: not a single failure yet! | |
ID: 61234 | Rating: 0 | rate: / Reply Quote | |
well i did switch all my computers to linux. even personal ones. the only windows system I have is my work provided laptop. but i could do everything i need on a linux laptop. WINE runs a lot of things these days. | |
ID: 61235 | Rating: 0 | rate: / Reply Quote | |
Ian, are you saying that even after you've set DCF to a low value in the client_state file that it is still escalating? | |
ID: 61236 | Rating: 0 | rate: / Reply Quote | |
Ian, are you saying that even after you've set DCF to a low value in the client_state file that it is still escalating? my DCF was set to about 0.01, and my tasks were estimating that they would take 27hrs each to complete. i changed the DCF to 0.0001, and that changed the estimate to about 16mins each. then after a short time i noticed that the time to completion estimate was going up again, reaching back to 27hrs again. i checked DCF and it's back to 0.01. ____________ | |
ID: 61237 | Rating: 0 | rate: / Reply Quote | |
First, it’s well known at this point that these tasks require a lot of VRAM. So some failures are to be expected from that. The VRAM utilization is not constant, but spikes up and down. From the tasks running on my systems, loading up to 5-6GB and staying around that amount is pretty normal, with intermittent spikes to the 9-12GB+ range occasionally. Just by looking at the failure rate of different GPUs, I’m estimating that most tasks need more than 8GB (>70%), a small amount of tasks need more than 12GB (~5%), and a very small number of them need even more than 16GB (<1%). A teammate of mine is running on a couple 2080Tis (11GB) and has had some failures but mostly success. As you suggested in some previous post, VRAM utilization seems to be bound to every particular model of graphics card / GPU. GPUs with fewer CUDA cores available, seem to span less amount of VRAM. My GTX 1650 GPUs have 896 CUDA cores and 4 GB VRAM. My GTX 1650 SUPER GPU has 1280 CUDA cores and 4 GB VRAM. My GTX 1660 Ti GPU has 1536 CUDA cores and 6 GB VRAM. These cards are achieving currently an overall success of 44% on processing PYSCFbeta (676 valid versus 856 errored tasks at the moment of writing this). Not all the errors were due to memory overflows, some of them were due to not viable WUs or other reasons, but deeping in this would take too much time... Processing ATMbeta tasks, success was pretty close to 100% | |
ID: 61238 | Rating: 0 | rate: / Reply Quote | |
well i did switch all my computers to linux. even personal ones. the only windows system I have is my work provided laptop. but i could do everything i need on a linux laptop. WINE runs a lot of things these days. I'm trying to suppress my grinning at this upside down world... having retired my last Windoze box some time ago. On a more germane note... Between this CUDA Error of GINTint2e_jk_kernel: out of memory https://www.gpugrid.net/result.php?resultid=33956113 and this... Does the fact that the reported error (my most common error on RTX3080 12G cards) seems to say that it didn't really get to the imposed limit but still failed mean anything to anyone? I am ASSUMING that this is referring to the memory on the 12G vid card? cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 535,127,040 bytes (allocated so far: 4,278,332,416 bytes, limit set to: 9,443,495,116 bytes). https://www.gpugrid.net/result.php?resultid=33955488 And for what it's worth best I can tell I'm getting a lower error % on my RTX3070 8GB cards once I backed off the sclk/mclk clocks. Skip ____________ - da shu @ HeliOS, "A child's exposure to technology should never be predicated on an ability to afford it." | |
ID: 61239 | Rating: 0 | rate: / Reply Quote | |
well i did switch all my computers to linux. even personal ones. the only windows system I have is my work provided laptop. but i could do everything i need on a linux laptop. WINE runs a lot of things these days. Seems to me that your 3080 is the 10G version instead of 12G? | |
ID: 61240 | Rating: 0 | rate: / Reply Quote | |
well i did switch all my computers to linux. even personal ones. the only windows system I have is my work provided laptop. but i could do everything i need on a linux laptop. WINE runs a lot of things these days. I'm clearly in the presence of passionate Linux believers here... :-) Between thisCUDA Error of GINTint2e_jk_kernel: out of memory It does refer to video memory, but the limit each WU sets possibly doesn't take into account other processes allocating video memory. That would especially be an issue I think if you run multiple WU's in parallel. Try executing nvidia-smi to see which processes allocate how much video memory: svennemans@PCSLLINUX01:~$ nvidia-smi Sun Feb 11 17:29:48 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GTX 1080 Ti Off | 00000000:01:00.0 On | N/A | | 47% 71C P2 179W / 275W | 6449MiB / 11264MiB | 100% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1611 G /usr/lib/xorg/Xorg 534MiB | | 0 N/A N/A 1801 G /usr/bin/gnome-shell 75MiB | | 0 N/A N/A 9616 G boincmgr 2MiB | | 0 N/A N/A 9665 G ...gnu/webkit2gtk-4.0/WebKitWebProcess 12MiB | | 0 N/A N/A 27480 G ...38,262144 --variations-seed-version 125MiB | | 0 N/A N/A 46332 G gnome-control-center 2MiB | | 0 N/A N/A 47110 C python 5562MiB | +---------------------------------------------------------------------------------------+ My one running WU has allocated 5.5G but with the other running processes, total allocated is 6.4G. It would depend on implementation if the limit is calculated from the total CUDA memory or the actual free CUDA memory and whether that limit is updated only once at the start or multiple times. | |
ID: 61241 | Rating: 0 | rate: / Reply Quote | |
Good point about the other stuff on the card... right this minute it's taking a break from GPUGRID to do a Meerkat Burp7...
Skip PS: GPUGRID WUs are all 1x here. PPS: Yes, it's the 10G version! PPPS: Also my adhoc perception of error rates was wrong... working on that. | |
ID: 61242 | Rating: 0 | rate: / Reply Quote | |
I believe the lowest value that DCF can be in the client_state file is 0.01 | |
ID: 61243 | Rating: 0 | rate: / Reply Quote | |
bonjour apparemment maintenant ça fonctionne sur mes 2 gpu-gtx 1650 et rtx 4060. | |
ID: 61244 | Rating: 0 | rate: / Reply Quote | |
Hello, | |
ID: 61245 | Rating: 0 | rate: / Reply Quote | |
Yes I would not expect the app to work on WSL. There are many linux specific libraries in the packaged python environent that is the "app". Actually, it *should* work, since WSL2 is being sold as a native Linux kernel running in a virtual environment with full system call compatibility. So one could reasonably expect any native linux libraries to work as expected. However there are obviously still a few issues to iron out. Not by gpugrid to be clear - by microsoft. | |
ID: 61246 | Rating: 0 | rate: / Reply Quote | |
I'm seeing a bunch of checksum errors during unzip, anyone else have this problem? Stderr output <core_client_version>7.20.5</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> 11:26:18 (177385): wrapper (7.7.26016): starting lib/libcufft.so.10.9.0.58 bad CRC e458474a (should be 0a867ac2) boinc_unzip() error: 2 </stderr_txt> ]]> The workunits seem to all run fine on a subsequent host. | |
ID: 61252 | Rating: 0 | rate: / Reply Quote | |
bonjour, | |
ID: 61260 | Rating: 0 | rate: / Reply Quote | |
bonjour, Maybe try a different version. I have always used Windows (and still do on some systems) but use Linux Mint on others. Really user friendly and a very similar feel to Windows. | |
ID: 61261 | Rating: 0 | rate: / Reply Quote | |
Between thisCUDA Error of GINTint2e_jk_kernel: out of memory Sometimes I get the same error on my 3080 10 GB Card. E.g., https://www.gpugrid.net/result.php?resultid=33960422 Headless computer with a single 3080 running 1C + 1N. | |
ID: 61269 | Rating: 0 | rate: / Reply Quote | |
I believe the lowest value that DCF can be in the client_state file is 0.01 Zoltan posted long ago that BOINC does not understand zero and 0.01 is as close as it can get. I wonder if that was someones approach to fixing a division by zero problem in antiquity. | |
ID: 61270 | Rating: 0 | rate: / Reply Quote | |
... After logging error rates for a few days across 5 boxes w/ Nvidia cards (all RTX30x0, all Linux Mint v2x.3) and trying to be aware of what I was doing on the main desktop while 'python' was running along with some sclk / mclk cut backs, shows the avg error rate is dropping. The last cut shows it at 23.44% across the 5 boxes averaged over 28 hours. No longer any segfault 0x8b errors, all 0x1. The last one was on the most troublesome of the 3070 cards. https://www.gpugrid.net/result.php?resultid=33950656 Anything I can do to help with this type of error? Skip | |
ID: 61272 | Rating: 0 | rate: / Reply Quote | |
... its still an out of memory error. a little further up in the error log shows this: "CUDA Error of GINTint2e_jk_kernel: out of memory" so it's probably just running out of memory at a different stage of the task, producing a slightly different error, but still an issue with not enough memory. ____________ | |
ID: 61273 | Rating: 0 | rate: / Reply Quote | |
I'm seeing a bunch of checksum errors during unzip, anyone else have this problem? I didn't find any of these in the 10GB 3080 errors that occurred so far today. Will check the 3070 cards shortly. Skip | |
ID: 61274 | Rating: 0 | rate: / Reply Quote | |
Thanx... as I suspected and this is my most common error now. Along with these that I'm thinking are also memory related also from a different point in the process... same situation w/o having reached the cap limit shown. https://www.gpugrid.net/result.php?resultid=33962293 Skip | |
ID: 61275 | Rating: 0 | rate: / Reply Quote | |
between your systems and mine, looking at the error rates; | |
ID: 61276 | Rating: 0 | rate: / Reply Quote | |
I'm seeing a bunch of checksum errors during unzip, anyone else have this problem? 8GB 3070 card errors today checked were all: CUDA Error of GINTint2e_jk_kernel: out of memory Skip | |
ID: 61277 | Rating: 0 | rate: / Reply Quote | |
I'm seeing a bunch of checksum errors during unzip, anyone else have this problem? 8GB 3070 card errors today checked were all: CUDA Error of GINTint2e_jk_kernel: out of memory Skip | |
ID: 61278 | Rating: 0 | rate: / Reply Quote | |
between your systems and mine, looking at the error rates; Thanx for info. As is right now the only cards I have w/ 16GB are my RX6800/6800xt cards. https://ibb.co/hKZtR0q Guess I need to start a go-fund-me for some $600 12GB 4070 Super cards that I've been eyeing up ;-) Skip | |
ID: 61279 | Rating: 0 | rate: / Reply Quote | |
a $600 12GB Titan V is like 4x faster though. other projects are a consideration of course. ____________ | |
ID: 61280 | Rating: 0 | rate: / Reply Quote | |
If this quantum chemistry project is going to last for more than a year, perhaps a $170 (via ebay) investment on Tesla P100 16G may be worth it? If you look at my gpugrid output via boincstat, I'm doing like 20M PPD over the past 4 days running on a single card with power limit of 130W. I've processed more than 1000 tasks and I think I have 2 failures with its 16G memory. | |
ID: 61281 | Rating: 0 | rate: / Reply Quote | |
My DCF is set to 0.02 | |
ID: 61282 | Rating: 0 | rate: / Reply Quote | |
Can you point me to someplace I can educate myself a bit on using Titan V cards for BOINC. I see some for $600 used on ebay. As u know there is no used market for used 'Super' cards yet. Did u mean 4x faster than a 4070 Super or than the 3070 I would replace with it? Thanx, Skip | |
ID: 61283 | Rating: 0 | rate: / Reply Quote | |
Ah, it's an FP64 thing. Any other projects doing heavy FP64 lifting since the demise of MW GPU WUs? | |
ID: 61284 | Rating: 0 | rate: / Reply Quote | |
ATMbeta tasks here have some small element of FP64. (integration) | |
ID: 61285 | Rating: 0 | rate: / Reply Quote | |
between your systems and mine, looking at the error rates; Not sure why but... Error rates seemed to start dropping after 5pm (23:00 Zulu) today. Overall error average since 2/11 across my 5 Nvid cards was 26.7% with it slowly creeping down over time. Early on a little bit of this was the result of lowering clocks to eliminate the occasional segfault (0x8b). The average of the last two captures today across the 5 cards was 20.5% For the last 6 hour period I just checked, my 10GB card average error rate dropped to 17.3% (15.92 & 18.7) and the 8GB card error rate was at 21.3%. Skip | |
ID: 61288 | Rating: 0 | rate: / Reply Quote | |
les unites de calcul pour windows sont elles arrivées? | |
ID: 61291 | Rating: 0 | rate: / Reply Quote | |
It isn't the tasks which need to be released, it's the application programs needed to run them. | |
ID: 61292 | Rating: 0 | rate: / Reply Quote | |
Watching Stderr output report for a certain PYSCFbeta task, can be found a line like this: . Where "N" corresponds to the Device Number (GPU) where the task was run on. This is very much appreciated on multi GPU hosts when trying to identify reliable or unreliable devices. This allows, if desired, to exclude unreliable devices as of this Ian&Steve C. kind advice. A similar feature would be useful at other apps, as ATMbeta. | |
ID: 61293 | Rating: 0 | rate: / Reply Quote | |
between your systems and mine, looking at the error rates; IGNORE... all went to crap the next day (today) | |
ID: 61295 | Rating: 0 | rate: / Reply Quote | |
yeah i've been seeing higher error rates on my 12GB cards too. | |
ID: 61296 | Rating: 0 | rate: / Reply Quote | |
My preferences are set to receive work from all apps, including beta ones, but none of my 4 GB VRAM graphics cards have received lately PYSCFbeta tasks. | |
ID: 61305 | Rating: 0 | rate: / Reply Quote | |
My GPUs are all on the smaller-memory side, too. Since ATMbeta tasks became available again, I haven't picked up a single Quantum chemistry task. | |
ID: 61306 | Rating: 0 | rate: / Reply Quote | |
My GPUs are all on the smaller-memory side, too. Since ATMbeta tasks became available again, I haven't picked up a single Quantum chemistry task. it's because you have test tasks enabled. with that, it's giving preferential treatment for ATM tasks which are classified in the scheduler as beta/test. QChem seems to not be classified in the scheduler as "test" or beta. despite being treated as such by the staff and the app name literally has the word beta in it. if you disable test tasks, and enable only QChem, you will get them still. ____________ | |
ID: 61307 | Rating: 0 | rate: / Reply Quote | |
it's because you have test tasks enabled. with that, it's giving preferential treatment for ATM tasks which are classified in the scheduler as beta/test. Thank you, that fully explains the fact. In the dilemma of choosing between my 50% erroring PYSCFbeta or 100% succeeding ATMbeta tasks, I'll keep this last. | |
ID: 61308 | Rating: 0 | rate: / Reply Quote | |
bonjour, | |
ID: 61311 | Rating: 0 | rate: / Reply Quote | |
schedule requests from your host are not specific about what it's asking for. it just asks for work for "Nvidia" and the scheduler on the project side decides what you need and what to send based on your preferences. the way the scheduler is setup right now, you wont be sent both types of work when both are available, only ATM. | |
ID: 61312 | Rating: 0 | rate: / Reply Quote | |
ok merci | |
ID: 61313 | Rating: 0 | rate: / Reply Quote | |
QChem seems to not be classified in the scheduler as "test" or beta. despite being treated as such by the staff and the app name literally has the word beta in it. if you disable test tasks, and enable only QChem, you will get them still. Giving a bit more assortment to current GPUGRID apps spectrum, I happened to be watching Server status page when a limited number (about 215) of "ATM: Free energy calculations of protein-ligand binding" tasks grew up. To be distinguished from previously existing ATMbeta branch. I managed to configure a venue at GPUGRID preferences page to catch one of them before unsent tasks vanished. Task: tnks2_m5f_m5l_1_RE-QUICO_ATM_GAFF2_1fs-0-5-RND3367_1 To achieve this, I disabled getting test apps, and enabled only (somehow paradoxical ;-) "ATM (beta)" app. That task is currently running at my GTX 1660 Ti GPU, at an estimated rate of 9,72% per hour. And quickly returning to PYSCFbeta (QChem) topic: tasks for this app grew up today to a noticeable amount of 80K+ ready to send ones. After peaking, QChem unsent tasks are now decreasing again. | |
ID: 61319 | Rating: 0 | rate: / Reply Quote | |
Bonjour | |
ID: 61320 | Rating: 0 | rate: / Reply Quote | |
Yes, ATM and ATMbeta apps have both Windows and Linux versions currently available. | |
ID: 61321 | Rating: 0 | rate: / Reply Quote | |
Regarding Quantum chemistry, there is no still any Windows version :-( :-( :-( | |
ID: 61322 | Rating: 0 | rate: / Reply Quote | |
This one barely made it: | |
ID: 61353 | Rating: 0 | rate: / Reply Quote | |
https://imgur.com/evCBB73 | |
ID: 61360 | Rating: 0 | rate: / Reply Quote | |
ID: 61363 | Rating: 0 | rate: / Reply Quote | |
to be expected with 8-10GB cards. | |
ID: 61364 | Rating: 0 | rate: / Reply Quote | |
On my GTX1080ti 11GB, I've only got about 1% error rate due to memory. | |
ID: 61370 | Rating: 0 | rate: / Reply Quote | |
to be expected with 8-10GB cards. They do: 8GB – last 2 checks of 2 cards 44.07 10GB – last 2 checks of 2 cards 30.80 12GB – last 2 checks of 1 card 7.62 But I need to look at the last day or two as rates have been going up. ____________ - da shu @ HeliOS, "A child's exposure to technology should never be predicated on an ability to afford it." | |
ID: 61401 | Rating: 0 | rate: / Reply Quote | |
Anyone have insight into this error: | |
ID: 61402 | Rating: 0 | rate: / Reply Quote | |
Download error causing the zip file to be corrupted because it is missing the end of file signature. | |
ID: 61403 | Rating: 0 | rate: / Reply Quote | |
Download error causing the zip file to be corrupted because it is missing the end of file signature. Well after 100+ of these errors I finally got 3 good ones out of that box after a reboot for a different reason. Thanx, Skip | |
ID: 61404 | Rating: 0 | rate: / Reply Quote | |
Bonjour | |
ID: 61405 | Rating: 0 | rate: / Reply Quote | |
There are not for this project (at this time). | |
ID: 61406 | Rating: 0 | rate: / Reply Quote | |
Error rates skyrocketed on me for this app... even on the 10GB cards (12GB card will be back on Thursday). This started late on April 7th. | |
ID: 61454 | Rating: 0 | rate: / Reply Quote | |
Error rates skyrocketed on me for this app... even on the 10GB cards (12GB card will be back on Thursday). This started late on April 7th. It's not you. its the new v4 tasks require more VRAM. I asked about this on their discord. I asked: it seems the newer "v4" tasks on average require a bit more VRAM than the previous v3 tasks. I'm seeing a higher error percentage on 12GB cards. Steve replied: yes this make sense unfortunately. In the previous round of "inputs_v3**" it was calculating things incorrectly for any molecule containing Iodine. This is heaviest element in our dataset. The computational cost of this QM method scales with the size of the elements (it depends on the number of electrons). We are resending the incorrect calculations for Iodine containing molecules in this round of "v4" work units. Therefore the v4 set is a subset of the previous v3 WUs containing heavier elements, hence there are more OOM errors. ____________ | |
ID: 61455 | Rating: 0 | rate: / Reply Quote | |
Thank you. U probably just saved me hours of wasted time. | |
ID: 61456 | Rating: 0 | rate: / Reply Quote | |
Steve replied: yes this make sense unfortunately. In the previous round of "inputs_v3**" it was calculating things incorrectly for any molecule containing Iodine. This is heaviest element in our dataset. The computational cost of this QM method scales with the size of the elements (it depends on the number of electrons). We are resending the incorrect calculations for Iodine containing molecules in this round of "v4" work units. Therefore the v4 set is a subset of the previous v3 WUs containing heavier elements, hence there are more OOM errors. Any change in this situation? I got my 12GB card back and my haphazard data collection seems to have it under a 9% error rate and with the very last grab showing 5.85%. The 8GB & 10GB cards are still on NNW (other than 3 WUs i let thru on 10GB cards. They completed). Skip | |
ID: 61470 | Rating: 0 | rate: / Reply Quote | |
Steve replied:yes this make sense unfortunately. In the previous round of "inputs_v3**" it was calculating things incorrectly for any molecule containing Iodine. This is heaviest element in our dataset. The computational cost of this QM method scales with the size of the elements (it depends on the number of electrons). We are resending the incorrect calculations for Iodine containing molecules in this round of "v4" work units. Therefore the v4 set is a subset of the previous v3 WUs containing heavier elements, hence there are more OOM errors. Somethings coming around... error rates for 10GB cards are now under 13% and the 12GB card is ~3%. Skip | |
ID: 61471 | Rating: 0 | rate: / Reply Quote | |
I also see about 3% on my 12GB cards. | |
ID: 61472 | Rating: 0 | rate: / Reply Quote | |
Right now, I am seeing less than a 2% error rate on my computers, each has a 11 GB card. This does vary over time. | |
ID: 61473 | Rating: 0 | rate: / Reply Quote | |
I'm only seeing a single Memory error in the last 300 results for my gtx 1080Ti (11GB), so 0.33% | |
ID: 61474 | Rating: 0 | rate: / Reply Quote | |
No, I've not had any CRC errors unzipping the tar archives. | |
ID: 61475 | Rating: 0 | rate: / Reply Quote | |
Error rate for QChem tasks seems to have pretty decreased lately on my 4GB VRAM graphics cards. | |
ID: 61485 | Rating: 0 | rate: / Reply Quote | |
Average error % rate as of early 5/7/24 using last 3 data scrapes - | |
ID: 61490 | Rating: 0 | rate: / Reply Quote | |
any news on as to whether QC will ever run on Windows machines ? | |
ID: 61498 | Rating: 0 | rate: / Reply Quote | |
Believe the news still is that until the external repositories that the QC tasks use, create and compile Windows libraries, there won't ever be any Windows apps here. | |
ID: 61500 | Rating: 0 | rate: / Reply Quote | |
Average error % rate as of early 5/7/24 using last 3 data scrapes - 5/14/24. 8:57am Zulu using last 3 data scrapes - AVG – last 3: 20.4 all cards 8GB – last 3: 25.22 (2x 3070) 10GB – last 3: 24.62 (2x 3080) 12GB – last 3: 2.11 (1x 4070S) Skip | |
ID: 61502 | Rating: 0 | rate: / Reply Quote | |
Hello Steve, | |
ID: 61890 | Rating: 0 | rate: / |