Message boards : Number crunching : Everyone is getting computation errors
Author | Message |
---|---|
I suspended GPUGrid | |
ID: 49862 | Rating: 0 | rate: / Reply Quote | |
Same. Errors on last 5 WUs within 2 seconds. | |
ID: 49863 | Rating: 0 | rate: / Reply Quote | |
Ya, I was scrambling around downclocking my cards and turning up the voltage having a litter of kittens wondering how all 4 of my 1180's all went bad at the same time. | |
ID: 49864 | Rating: 0 | rate: / Reply Quote | |
I get no errors on my Linux systems. | |
ID: 49865 | Rating: 0 | rate: / Reply Quote | |
Maybe it's a Windows only thing, both Pablo and Adria WU's are getting errors in Windows not Linux. | |
ID: 49866 | Rating: 0 | rate: / Reply Quote | |
On my Windows 10 PC always updated and with nVidia drivers I also get errors so I am running GPUGRID only on the two Linux boxes. | |
ID: 49868 | Rating: 0 | rate: / Reply Quote | |
same thing here: all newly downloaded tasks (regardless whether PABLO or ADRIA) error out after a few seconds: | |
ID: 49870 | Rating: 0 | rate: / Reply Quote | |
I think this time the Windows / CUDA8.0 client got its license expired, as the Windows XP / CUDA6.5 and the Linux / CUDA8.0 client is working fine. | |
ID: 49872 | Rating: 0 | rate: / Reply Quote | |
After purchase of a new license, all of us need a reset of daily quota to crunch WU, is't correct ? | |
ID: 49873 | Rating: 0 | rate: / Reply Quote | |
No need to reset it for the sake of just one day. | |
ID: 49874 | Rating: 0 | rate: / Reply Quote | |
The error rate on the Server status page of three workunit batches are in the red range (above 75%) now | |
ID: 49875 | Rating: 0 | rate: / Reply Quote | |
No word from the staff yet when it's safe to start crunching? The Linux guys should have enough work to go the next couple of days. Doesn't anyone monitor the servers and software over the weekend? | |
ID: 49884 | Rating: 0 | rate: / Reply Quote | |
It is a small (but still very productive) team, and this is weekend, and today was a world cup final. Let them live ;) | |
ID: 49885 | Rating: 0 | rate: / Reply Quote | |
I do not suspend GPUGruid, because it only takes 3 to 8 Seconds. | |
ID: 49906 | Rating: 0 | rate: / Reply Quote | |
I do not suspend GPUGruid, because it only takes 3 to 8 Seconds. | |
ID: 49907 | Rating: 0 | rate: / Reply Quote | |
I was running an ADRIA_FOLDT1015 on my GTX 1060 (Ubuntu 16.04) when it crashed. Not only that, but it took out the QC work units running on the CPU also. I will lay off the GPU for a while; it is too warm anyway. | |
ID: 49908 | Rating: 0 | rate: / Reply Quote | |
Now, I am getting the same error on cuda 6.5 / windows xp. | |
ID: 49910 | Rating: 0 | rate: / Reply Quote | |
Now, I am getting the same error on cuda 6.5 / windows xp.Yep, me too. Too bad... At least my electricity bill will be the lowest in years... | |
ID: 49913 | Rating: 0 | rate: / Reply Quote | |
I can't believe they haven't fixed this yet, over 4300 work units now and growing. It's obvious the Linux machines can't keep up, this is starting to get strange. | |
ID: 49915 | Rating: 0 | rate: / Reply Quote | |
Now, I am getting the same error on cuda 6.5 / windows xp.Yep, me too. Too bad... At least my electricity bill will be the lowest in years... same here :-( is GPUGRID falling apart? | |
ID: 49916 | Rating: 0 | rate: / Reply Quote | |
Toni who is probably the most qualified person for updating the app with a new ACEMD version is currently on holidays without good internet. I am also on holidays currently although I doubt I could have fixed it anyway. I told the guys at the lab to cancel the GPU workunits until it's fixed, so you might have to wait a few days before we fix it and send out new ones. I'm sorry but some stuff is beyond my control sometimes. | |
ID: 49917 | Rating: 0 | rate: / Reply Quote | |
Toni who is probably the most qualified person for updating the app with a new ACEMD version is currently on holidays without good internet. I am also on holidays currently although I doubt I could have fixed it anyway. I told the guys at the lab to cancel the GPU workunits until it's fixed, so you might have to wait a few days before we fix it and send out new ones. I'm sorry but some stuff is beyond my control sometimes. It might be better to keep the tasks, but deprecate the Windows apps - that way, you would still get some work done (albeit at only ~20% capacity) by your Linux volunteers. | |
ID: 49918 | Rating: 0 | rate: / Reply Quote | |
+1Toni who is probably the most qualified person for updating the app with a new ACEMD version is currently on holidays without good internet. I am also on holidays currently although I doubt I could have fixed it anyway. I told the guys at the lab to cancel the GPU workunits until it's fixed, so you might have to wait a few days before we fix it and send out new ones. I'm sorry but some stuff is beyond my control sometimes. | |
ID: 49919 | Rating: 0 | rate: / Reply Quote | |
You are entitled to a holiday :-) | |
ID: 49920 | Rating: 0 | rate: / Reply Quote | |
+1Toni who is probably the most qualified person for updating the app with a new ACEMD version is currently on holidays without good internet. I am also on holidays currently although I doubt I could have fixed it anyway. I told the guys at the lab to cancel the GPU workunits until it's fixed, so you might have to wait a few days before we fix it and send out new ones. I'm sorry but some stuff is beyond my control sometimes. +2 Still crunching here. | |
ID: 49921 | Rating: 0 | rate: / Reply Quote | |
They are probably making sure the results given back so far are valid and scientifically useful as I'm sure trust in the results after something like this is probably slim. | |
ID: 49922 | Rating: 0 | rate: / Reply Quote | |
I am a new user and don't want to criticize. But I see that minimum quorum is one.Why? | |
ID: 49923 | Rating: 0 | rate: / Reply Quote | |
It might be better to keep the tasks, but deprecate the Windows apps - that way, you would still get some work done (albeit at only ~20% capacity) by your Linux volunteers. I will put my GTX 980 on Ubuntu to help. My GTX 1060 that crashed was overheating at 82C or more - it has a bad heatsink or voltage regulator or something. | |
ID: 49924 | Rating: 0 | rate: / Reply Quote | |
My GTX 1060 that crashed was overheating at 82C or more - it has a bad heatsink or voltage regulator or something. Try taking off the heat sync and changing the thermal paste. Whatever you put on will definitely be better than stock and will last a lot longer. I recommend Arctic Silver 5, but make sure you don't get any on components because it is conductive. | |
ID: 49927 | Rating: 0 | rate: / Reply Quote | |
Try taking off the heat sync and changing the thermal paste. Yes, I did that a few weeks ago, using Arctic MX-4. It didn't change a thing. I noticed several months ago that it was getting too warm for comfort, and have tried it now in three different machines. One of them has a 120mm rear exhaust fan, a 120mm top exhaust fan, and a 120mm front intake fan. It still ran at 80C, in a cool room. I think it is gone - either a heatpipe, or else the GPU chip itself or voltage regulator is causing too much current to flow. I have an EVGA GTX 970 though which will work fine until Nvidia decides to release something worth buying at a reasonable price. | |
ID: 49928 | Rating: 0 | rate: / Reply Quote | |
Well apparently Gianni also knows how to deprecate apps. So now we will have Raimondas compiling the new app version which may take a few days and then he will deploy the new app. I assume we should have some sort of tutorial on this stuff for more people but from what I gather managing BOINC is a very esoteric business | |
ID: 49942 | Rating: 0 | rate: / Reply Quote | |
OK, a few days have happen. | |
ID: 49991 | Rating: 0 | rate: / Reply Quote | |
Hello, | |
ID: 49992 | Rating: 0 | rate: / Reply Quote | |
I have to install the cuda toolkit 9.1 ? no, the only thing that would help is to install Linux | |
ID: 49993 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Everyone is getting computation errors