Message boards : Number crunching : Error after 14 hours of calculations and no points
Author | Message |
---|---|
15179788 11654948 189775 30 Jun 2016 | 10:52:11 UTC 1 Jul 2016 | 10:19:27 UTC Erreur en cours de calculs 50,265.21 20,171.65 --- Long runs (8-12 hours on fastest card) v8.48 (cuda65, | |
ID: 43851 | Rating: 0 | rate: / Reply Quote | |
Based on your words, you care more about points, but not help scientists in their research. | |
ID: 43923 | Rating: 0 | rate: / Reply Quote | |
Points and badges are secondary for me, which displease me this is the units that end up too often in error, especially after 14 hours of calculations ... | |
ID: 43927 | Rating: 0 | rate: / Reply Quote | |
What is the exact make and model of your GPU? | |
ID: 43931 | Rating: 0 | rate: / Reply Quote | |
https://www.asus.com/Graphics-Cards/GTXTITAN6GD5/ | |
ID: 43933 | Rating: 0 | rate: / Reply Quote | |
Seems like it's getting too hot at 81C. Maybe try keeping the temp down and see if that helps. Afterburner to control fan speeds, perhaps try opening the case as a test. TThrottle will also monitor your temps and you can set limits. Maybe not the most efficient solution time-wise but may save your hardware in a pinch. | |
ID: 43935 | Rating: 0 | rate: / Reply Quote | |
The nVidia driver handles temperature never exceeds 80 ° | |
ID: 43939 | Rating: 0 | rate: / Reply Quote | |
The latest drivers have a new power setting in Nvidia control panel they used to be "adaptive" "Prefer maximum performance". | |
ID: 43942 | Rating: 0 | rate: / Reply Quote | |
I am going to try to calculate a unity GpuGrid with "maximum performance", thank you for your answer. | |
ID: 43943 | Rating: 0 | rate: / Reply Quote | |
The unit still planted before the end, but fortunately has not finished mistake, I think my Titan is now too powerful to just under GPUGrid turn, may be I would return to your project when I would have a Titan Pascal. | |
ID: 43945 | Rating: 0 | rate: / Reply Quote | |
Zarck, if your titan is running in double precision mode, turn DP off. | |
ID: 43949 | Rating: 0 | rate: / Reply Quote | |
Dp is Off. | |
ID: 43950 | Rating: 0 | rate: / Reply Quote | |
Just checking! | |
ID: 43951 | Rating: 0 | rate: / Reply Quote | |
I'll try to do a unit with the DP on, the frequency lower, I would be less of a problem, I'll try with a unit if this does not solve my problem, I return to Poem. | |
ID: 43954 | Rating: 0 | rate: / Reply Quote | |
By activating the DP mode my last GPUGrid unit ended without problem. I'll try to make another without Dp with the new nVidia driver 368.81. | |
ID: 43966 | Rating: 0 | rate: / Reply Quote | |
No problem with the new driver, provided here lasts. | |
ID: 43974 | Rating: 0 | rate: / Reply Quote | |
Last unit still in error, I make a last unit and I stop. | |
ID: 43979 | Rating: 0 | rate: / Reply Quote | |
15205150 11672471 189775 16 Jul 2016 | 22:42:05 UTC 17 Jul 2016 | 16:45:02 UTC Error in calculation 32,285.05 8,542.67 --- Long runs (8-12 hours on fastest card) v8.48 (cuda65) | |
ID: 43999 | Rating: 0 | rate: / Reply Quote | |
How are you shutting your computer down? For the tasks that are error'ing out for you, are they interrupted by a Windows Restart or an abrupt power outage? | |
ID: 44000 | Rating: 0 | rate: / Reply Quote | |
| |
ID: 44003 | Rating: 0 | rate: / Reply Quote | |
Okay. Good luck. | |
ID: 44004 | Rating: 0 | rate: / Reply Quote | |
@+ | |
ID: 44044 | Rating: 0 | rate: / Reply Quote | |
I also had a problem with Collatz Gpu, my problem is corrected Collatz with the solution below, I hope it fix my problem with GPUGrid. | |
ID: 44045 | Rating: 0 | rate: / Reply Quote | |
When I install a new partition, I usually install all current versions, of all the runtimes. Microsoft Visual C++ 2005 Redistributable (x64) - 8.0.61000 - Service Pack 1 MFC Security Update Microsoft Visual C++ 2005 Redistributable (x86) - 8.0.61001 - Service Pack 1 MFC Security Update Microsoft Visual C++ 2008 Redistributable (x64) - 9.0.30729.6161 - Service Pack 1 MFC Security Update Microsoft Visual C++ 2008 Redistributable (x86) - 9.0.30729.6161 - Service Pack 1 MFC Security Update Microsoft Visual C++ 2010 Redistributable (x64) - 10.0.40219 - Service Pack 1 MFC Security Update Microsoft Visual C++ 2010 Redistributable (x86) - 10.0.40219 - Service Pack 1 MFC Security Update Microsoft Visual C++ 2012 Redistributable (x64) - 11.0.61030 - Visual Studio 2012 Update 4 Microsoft Visual C++ 2012 Redistributable (x86) - 11.0.61030 - Visual Studio 2012 Update 4 Microsoft Visual C++ 2013 Redistributable (x64) - 12.0.30501 Microsoft Visual C++ 2013 Redistributable (x86) - 12.0.30501 Microsoft Visual C++ 2015 Redistributable (x64) - 14.0.23026 Microsoft Visual C++ 2015 Redistributable (x86) - 14.0.23026 | |
ID: 44046 | Rating: 0 | rate: / Reply Quote | |
Found both my GPUGrid tasks had failed overnight and the system had restarted. Had to force another restart as my first monitor stopped displaying anything - my first GPU stopped working properly (display went to VGA, despite being connected via a DVI cable). | |
ID: 44054 | Rating: 0 | rate: / Reply Quote | |
GPUGrid still has problems, when an "abrupt unexpected restart" (like a BSOD or a power outage) happens. Basically, after the unexpected restart, when Windows loads, and BOINC loads, sometimes GPUGrid tasks will caused TDRs, BSODs, and can even make other projects' tasks fail as fallout. | |
ID: 44055 | Rating: 0 | rate: / Reply Quote | |
GPUGrid still has problems, when an "abrupt unexpected restart" (like a BSOD or a power outage) happens. Basically, after the unexpected restart, when Windows loads, and BOINC loads, sometimes GPUGrid tasks will caused TDRs, BSODs, and can even make other projects' tasks fail as fallout. Have been meaning to start a thread about this problem. While I almost never see BSODs, we've been having storms and power glitches. Every time the power goes down 17 WUs have about a 50% chance of failing on restart. As you say, it will now and then take out other projects when the GPUGrid WU fails. Whats even worse is that some WUs will appear to restart from the beginning but then will be marked invalid at completion. If you see that one has restarted, abort it as it will most likely just be a waste of time. I wish GPUGrid would fix their problems resuming from this scenario, or at least terminate the task gracefully without causing the system to crap itself! If GPUGrid wants to increase their usability, this is the first issue they should fix. Making their application more fault tolerant should be at the top of their priority list. More important than pascal I think... | |
ID: 44056 | Rating: 0 | rate: / Reply Quote | |
Driver 369.00 available, | |
ID: 44057 | Rating: 0 | rate: / Reply Quote | |
GPUGrid still has problems, when an "abrupt unexpected restart" (like a BSOD or a power outage) happens. Basically, after the unexpected restart, when Windows loads, and BOINC loads, sometimes GPUGrid tasks will caused TDRs, BSODs, and can even make other projects' tasks fail as fallout. I've noticed that the possibility of this error is higher for faster GPUs. Some files used for restarting the calculation get corrupted (filled by zeroes) in the slot folder of the given GPUGrid task. I think the reason for this corruption is that these files are written too frequently when the GPU is fast, so the OS never writes their content to the disk. To overcome this the app should use the non-cached write API call of the OS for these files. The user can disable write-behind caching in the meantime. (Device manager -> Disk drives -> select your BOINC disk (double click) -> Policies tab -> Un-check (both) write caching option -> OK) I wish GPUGrid would fix their problems resuming from this scenario, or at least terminate the task gracefully without causing the system to crap itself! The need for a new app for the Pascal GPUs is a great opportunity to hit two birds with one stone. | |
ID: 44058 | Rating: 0 | rate: / Reply Quote | |
I managed to get two units in a row without crashing, provided it lasts. | |
ID: 44059 | Rating: 0 | rate: / Reply Quote | |
I managed to get two units in a row without crashing, provided it lasts. | |
ID: 44060 | Rating: 0 | rate: / Reply Quote | |
The need for a new app for the Pascal GPUs is a great opportunity to hit two birds with one stone. Hopefully, the new app will improve number crunching performance, as well. Maybe, it can be less CPU dependent. | |
ID: 44063 | Rating: 0 | rate: / Reply Quote | |
Too bad there is not enough units for everyone, I passed on Poem Gpu. | |
ID: 44064 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Error after 14 hours of calculations and no points