Message boards : Graphics cards (GPUs) : suddenly too many errors
Author | Message |
---|---|
I have had recently too many errors with my 24/7 rig containing triple GTX 970: | |
ID: 42037 | Rating: 0 | rate: / Reply Quote | |
I have had recently too many errors with my 24/7 rig containing triple GTX 970:Excerpt from this task's stderr.txt: # GPU 1 : 78C
# GPU 0 : 89C
# GPU 1 : 80C
# GPU 0 : 91C
# GPU 0 : 93C
# GPU 0 : 95C
# GPU 1 : 82C These temperatures are too high. You'll fry your cards.But the error message at the end of the output file says: SWAN : FATAL Unable to load module .mshake_kernel.cu. (999) It usually happens when you stop a task too early after it's started.https://www.gpugrid.net/result.php?resultid=14642713Another excerpt from this task's stderr.txt: # GPU 2 : 63C
# GPU 0 : 93C
# GPU 1 : 83C
# GPU 0 : 94C
# GPU 1 : 84C
# GPU 0 : 95C
# GPU 1 : 85C 95°C is way too high!I suspect that these cards have non standard cooling with axial fans, and emit the heat inside the case, heating each other. You should use only one such card in this computer, or at least replace one of the card to have at least one slot space between the two cards for proper airflow, and install some fans which remove the hot air from the case. On my 2nd rig 24/7 with double GTX 980Ti goes sometimes like:You will (if not already have) damage your cards permanently if you run them above 80°C. Every 10°C rise in temperatures halve the lifetime of the card, but above 80°C every 5°C rise does the same. Above 90°C there's a high chance of an immediate fatal failure of the GPU chip. | |
ID: 42038 | Rating: 0 | rate: / Reply Quote | |
So, I guess that I have to put the temperatures down and it should be fine. I just don´t understand why this happened after say 8 months of standard working at the same conditions. | |
ID: 42040 | Rating: 0 | rate: / Reply Quote | |
or this one, just recently: | |
ID: 42042 | Rating: 0 | rate: / Reply Quote | |
or this one, just recently:This is an overly overclocked card, perhaps you should reduce the memory clock to 3505MHz, and if it didn't help then the GPU clock by 20MHz decrements, until it gets stable. | |
ID: 42043 | Rating: 0 | rate: / Reply Quote | |
or this one, just recently: Bare with me on this one Viktor Install your graphics driver again but over the top of the last one. I had a situation like this a few years ago with dual cards and a new driver always needed to be installed twice. It could be what Retvari said about OC. Hey, its worth a shot. Don't forget to suspend any running GPUGrid WU's | |
ID: 42044 | Rating: 0 | rate: / Reply Quote | |
O.K. I´ll try. It is so annoying. As I said before, with the same OC no problem for months and now this. BTW, it´s factory overclocked. | |
ID: 42046 | Rating: 0 | rate: / Reply Quote | |
BTW, it´s factory overclocked.Everyone, who use factory overclocked cards (including me) should remember: The factory made these cards to play games on them 4-5 hours per day, not to crunch on them in 24 hours of 7 days of week. Nothing severe happens, when there's a glitch in a frame while you are playing, but when this glitch occur while crunching a workunit, this will result in an error, and you'll lose the actual workunit, and the time and the electricity. If this happens too often, then the time lost to the failed workunits could easily exceed the time gained by the faster processing, making the overclocking counter-productive. So in the terms of overclocking: less is more. | |
ID: 42047 | Rating: 0 | rate: / Reply Quote | |
this might Sound trivial but if it worked for months and now the temps are higher, you might try clean dust that gathered on the cards/coolers and check the temps after that | |
ID: 42050 | Rating: 0 | rate: / Reply Quote | |
I second FZB on this. | |
ID: 42055 | Rating: 0 | rate: / Reply Quote | |
I buy OC cards not because of the OC, but because of the quality build. The temperatures were high because of the 3way sli setup (no room between), I already made changes. | |
ID: 42056 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : suddenly too many errors