Message boards : Number crunching : GPUGrid problems, nothing has changed
Author | Message |
---|---|
This is the 3rd time that I've gone in heavily on GPUGrid over the last 10-11 years. Twice I've gotten frustrated with the problems and cut way back. I was hoping that some of the issues would have been fixed. There's been an ongoing problem of stalling uploads (not to mention downloads) for many years. It's still not fixed. In addition WUs that get interrupted often fail even with write caching disabled on the drives. | |
ID: 51606 | Rating: 0 | rate: / Reply Quote | |
Zoltan had some great advice for me a bit ago. I don't think I can fully remember every step, but it completely fixed my corrupted WUs after a power outage issue. It had to do with the device manager as far as I can recall. Maybe Zoltan can remember? | |
ID: 51607 | Rating: 0 | rate: / Reply Quote | |
Zoltan had some great advice for me a bit ago. I don't think I can fully remember every step, but it completely fixed my corrupted WUs after a power outage issue. It had to do with the device manager as far as I can recall. Maybe Zoltan can remember? That would be appreciated. Thanks. Another problem with the upload congestion problem is that some uploads can take upwards of 10 hours when a dozen or more are trying at once. Then they start missing the 24 hour cutoff, which is also irritating. | |
ID: 51608 | Rating: 0 | rate: / Reply Quote | |
I know the frustration, but ironically GPUGRID is the better project to me by a small margin. | |
ID: 51609 | Rating: 0 | rate: / Reply Quote | |
Zoltan had some great advice for me a bit ago. I don't think I can fully remember every step, but it completely fixed my corrupted WUs after a power outage issue. It had to do with the device manager as far as I can recall. Maybe Zoltan can remember? If you don't have a big enough upload pipe for reporting multiple tasks, you can restrict the number of uploads in cc_config.xml <max_file_xfers_per_project>1</max_file_xfers_per_project> That way a single finished task will get all of the capacity of your upload pipe to itself and transfer faster. | |
ID: 51610 | Rating: 0 | rate: / Reply Quote | |
If you don't have a big enough upload pipe for reporting multiple tasks, you can restrict the number of uploads in cc_config.xml Thanks, I've been meaning to try this. The problem then becomes that the CPU WUs create a huge backlog waiting while the huge GPUGrid upload stumbles along. The Ryzen 7 machines do a lot of CPU work pretty quickly. No wait, that's a command that I didn't know (per project). I will definitely try it. Thanks again! | |
ID: 51611 | Rating: 0 | rate: / Reply Quote | |
There is an option for the entire client and one per project. | |
ID: 51613 | Rating: 0 | rate: / Reply Quote | |
<max_file_xfers_per_project>1</max_file_xfers_per_project> Seems to be helping, there's not as much stalling. Will continue to monitor. | |
ID: 51614 | Rating: 0 | rate: / Reply Quote | |
Zoltan had some great advice for me a bit ago. I don't think I can fully remember every step, but it completely fixed my corrupted WUs after a power outage issue. It had to do with the device manager as far as I can recall. Maybe Zoltan can remember? I recall what Zoltan once told me. Go into Device Manager/ disk drives/ the drive BOINC is on/ policies/ uncheck "enable write caching on this device"/ reboot and you should be all set. | |
ID: 51634 | Rating: 0 | rate: / Reply Quote | |
I recall what Zoltan once told me. Go into Device Manager/ disk drives/ the drive BOINC is on/ policies/ uncheck "enable write caching on this device"/ reboot and you should be all set. Yes, this was/is exactly it. | |
ID: 51635 | Rating: 0 | rate: / Reply Quote | |
I recall what Zoltan once told me. Go into Device Manager/ disk drives/ the drive BOINC is on/ policies/ uncheck "enable write caching on this device"/ reboot and you should be all set. I've been unchecking that for years. Yes it helps but it didn't help with the power outage and 18 failed WUs that I described in the OP. All the drives on all my BOINC machines had write caching disabled. | |
ID: 51640 | Rating: 0 | rate: / Reply Quote | |
Interesting, it seemed to eliminate the problem for me when I enabled it | |
ID: 51642 | Rating: 0 | rate: / Reply Quote | |
Interesting, it seemed to eliminate the problem for me when I enabled it I also believed that before March 7th. Then I was educated x 18. However, it does help when write caching is disabled. One related thing I've found is that when Win10 reboots automatically to do updates, it must wait long enough for GPUGrid to close the WUs as they seem to survive that situation. Knock on wood... ;-) | |
ID: 51644 | Rating: 0 | rate: / Reply Quote | |
... when Win10 reboots automatically to do updates, it must wait long enough for GPUGrid to close the WUs ... how do your educate Win10 to wait long enough until the GPUGRID tasks stops? Even if a GPUGRID task is manually stopped in the BOINC manager, it takes up to a minute until it actually stops. | |
ID: 51645 | Rating: 0 | rate: / Reply Quote | |
... when Win10 reboots automatically to do updates, it must wait long enough for GPUGrid to close the WUs ... I have no idea. My observation is that SO FAR with 5 Win10 machines running 3 GPUGrid WUs each, I haven't had any WUs fail when Win10 decides to reboot to process updates. This has happened quite a few times. Maybe I've just been lucky, maybe not. | |
ID: 51646 | Rating: 0 | rate: / Reply Quote | |
Zoltan had some great advice for me a bit ago. I don't think I can fully remember every step, but it completely fixed my corrupted WUs after a power outage issue. It had to do with the device manager as far as I can recall. Maybe Zoltan can remember? Thanks again for this. It allowed me to keep more GPUs on the project, though I never could get them all shoehorned into my paltry UL bandwidth. Now with the rise of mostly KIX WUs and nearly double the UL size I have the problem again. Maybe someday my area will have better connectivity. For now I've had to transfer many of my GPUs to projects with lesser UL requirements. I very much like GPUGrid but have to lighten up on it for now. Keep up the great work! I'll keep running what I'm able to here. | |
ID: 51696 | Rating: 0 | rate: / Reply Quote | |
UPS. | |
ID: 51701 | Rating: 0 | rate: / Reply Quote | |
Extremely slow uploads here (Menlo Park, Ca) at 9:00 AM Pacific time, I have 100 Mbps down and 40 Mbps up and my connection is working perfectly according to a speed test I just did. I've noticed this only happens about 25% of the time with me, it is a major pain uploading at 300 Kbps. | |
ID: 51703 | Rating: 0 | rate: / Reply Quote | |
In the past, my equipment was excluded from GPUGRD at times because of lesser quality and low performing cards. So I finally broke down a few days back and bought an EVGA RTX 2080 with the anticipation of crunching along with the "Big Boys."And of course, quite naturally, I was able over the last couple of days to download a dozen tasks that require 8-12 hours on the fastest cards. And if failure is success then I succeeded perfectly: Every task errored out with minimum time being 8.11 seconds and the longest time before failure was 14.71 seconds. | |
ID: 52521 | Rating: 0 | rate: / Reply Quote | |
Your 2080 isn't supported yet, see here for more details... | |
ID: 52522 | Rating: 0 | rate: / Reply Quote | |
Your 2080 isn't supported yet, see here for more details... Here new Nvidia Turing series GPUs are listed: - NVIDIA TITAN RTX - RTX 2080 TI - RTX 2080 SUPER - RTX 2080 - RTX 2070 SUPER - RTX 2070 - RTX 2060 SUPER - RTX 2060 - GTX 1660 TI - GTX 1660 - GTX 1650 They all will fail every current ACEMD version WU after few seconds from start. The reason: Turing GPUs are not supported so far. GPUGrid team is developing a new ACEMD3 version that is likely to support Turing GPUs. I Have a GTX 1660 TI and a GTX 1650 (impatiently) waiting for this ;-) | |
ID: 52523 | Rating: 0 | rate: / Reply Quote | |
PDW: Thank you! Bill | |
ID: 52524 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : GPUGrid problems, nothing has changed