Message boards : Number crunching : Process exited with code 159 (0x9f, -97)
Author | Message |
---|---|
This is often what I get while crunching SANTI short run tasks. I can complete one task, and the following two error out. | |
ID: 34116 | Rating: 0 | rate: / Reply Quote | |
159 is one of the codes listed in the list of project error codes from the FAQ section. | |
ID: 34120 | Rating: 0 | rate: / Reply Quote | |
159 is one of the codes listed in the list of project error codes from the FAQ section. Thanks, According to this code, which refers to -97, it is an indication of a hardware issue. However, Milkyway, which has the same GPU usage as GPUGRID on my card, thus generating same temperature (77°C), runs just fine as does Einstein. Anyway. I added a big fan blowing towards the GPU. I have 2 GPUs in my system. Case is fully open on the GPUs side but they are fairly close to each others so I think the faster GPU (a GTX 560) can't suck in enough air to be cooled down further. Now with the blowing fan (120mm) it dropped from 77°C to 68°C. I will try to crunch a few GPUGRID WUs to see how it goes. Thanks for referring me to the codes. I overlooked that thread and honestly forgot about it. ____________ Team Belgium | |
ID: 34122 | Rating: 0 | rate: / Reply Quote | |
I think it should run OK at 77 C. If the temperature decrease doesn't fix the problem then maybe it's clocked too high? | |
ID: 34128 | Rating: 0 | rate: / Reply Quote | |
I think it should run OK at 77 C. If the temperature decrease doesn't fix the problem then maybe it's clocked too high? I ran two more tasks. One succeeded and one failed. Temp was around 68°C. The GPU runs at stock speeds. I haven't done anything to it. I also ran a memory test on the GPU but after 50 iterations, all was OK No idea what happens. Some tasks complete, others fail. ____________ Team Belgium | |
ID: 34129 | Rating: 0 | rate: / Reply Quote | |
Going through the tasks that have failed on your system I notice that the second iterations are almost always successful on 5xx or newer GPUs which leads me to believe the tasks themselves are not buggy. The second iterations seem to fail on 4xx GPUs. I must admit my sample group is small but it makes me wonder if older GPUs have the required compute capability for SANTI tasks. | |
ID: 34130 | Rating: 0 | rate: / Reply Quote | |
Which driver version are you running? It seems to be fairly new, since you'Re getting CUDA 5.5 tasks, but Linux doesn't show us anything else. | |
ID: 34131 | Rating: 0 | rate: / Reply Quote | |
I run GPUGRID only on the GTX 560. The GT 440 is used for other projects (Milkyway/Einstein). Driver version is latest stable 331.20 | |
ID: 34142 | Rating: 0 | rate: / Reply Quote | |
Even with 304.88 Linux drivers (which are not CUDA5.5) you still get CUDA5.5 tasks - tasks are not strictly allocated by driver version. The tasks run and complete as normal. | |
ID: 34149 | Rating: 0 | rate: / Reply Quote | |
I think the problem is fixed. I've crunched 12 short WUs in a row, without a single failure. That extra fan I added seems to have helped :) | |
ID: 34167 | Rating: 0 | rate: / Reply Quote | |
That's good to hear but I'm surprised it produced errors at 77 C. This spec sheet from NVIDIA claims the max. temperature is 98 C, not that one would ever want to run it that hot, but it seems like 77 C should be low enough to give error free results. So I wonder what software you use to check the temperature and does it give the temperature of both cards? Is it possible you reported the temperature of the other card which was getting better air flow and likely running cooler? | |
ID: 34168 | Rating: 0 | rate: / Reply Quote | |
That's good to hear but I'm surprised it produced errors at 77 C. This spec sheet from NVIDIA claims the max. temperature is 98 C, not that one would ever want to run it that hot, but it seems like 77 C should be low enough to give error free results. So I wonder what software you use to check the temperature and does it give the temperature of both cards? Is it possible you reported the temperature of the other card which was getting better air flow and likely running cooler? I use the NV control center to check temps. Without the extra fan, the GTX 560 was between 77 and 79°C when running GPUGRID. The GT 440, which doesn't run GPUGRID but Milkyway/Einstein, hovers around 60°C at full load. The GTX 560 is the first card and under it is the GT 440 so there's a narrow passage between it and the GT 440 so my theory is that it gets hotter because of the narrow opening and because the GT 440 also produces heat on the backside which gets sucked in by the GTX 560's fan. I know, the NV spec says it can go a lot higher but then I get failed WUs. No, it's not possible I reported wrong temps as you'd really need to be blind to confuse the two. The NV center under Linux separates both cards pretty well. Anyways. I'm happy now I can contribute more to GPUGRID :) ____________ Team Belgium | |
ID: 34170 | Rating: 0 | rate: / Reply Quote | |
The NV center under Linux separates both cards pretty well. That's what I was wondering. Thanks :) ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34174 | Rating: 0 | rate: / Reply Quote | |
Swap the card around and the 560 should be cooler. That's good to hear but I'm surprised it produced errors at 77 C. This spec sheet from NVIDIA claims the max. temperature is 98 C, not that one would ever want to run it that hot, but it seems like 77 C should be low enough to give error free results. So I wonder what software you use to check the temperature and does it give the temperature of both cards? Is it possible you reported the temperature of the other card which was getting better air flow and likely running cooler? ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 34176 | Rating: 0 | rate: / Reply Quote | |
Yeah, I did that. Reduced temps by a few degrees. My case really isn't made for multiple GPUs, as even on the bottom there isn't much space open | |
ID: 34190 | Rating: 0 | rate: / Reply Quote | |
Errors with 8.14: | |
ID: 34192 | Rating: 0 | rate: / Reply Quote | |
My case isn't suited for multiple GPUs either, but I manage it by using Precision-X to set a custom GPU fan curve where it will go maximum-fan before hitting 70*C, thus keeping GPU Boost at maximum clockrates, while keeping temperatures low enough to process tasks successfully. You might consider doing what I've done. (Despite being made by EVGA, any nVidia user can register/download/use Precision-X freely... though, not sure if it's available outside of Windows). | |
ID: 34194 | Rating: 0 | rate: / Reply Quote | |
My case isn't suited for multiple GPUs either, but I manage it by using Precision-X to set a custom GPU fan curve where it will go maximum-fan before hitting 70*C, thus keeping GPU Boost at maximum clockrates, while keeping temperatures low enough to process tasks successfully. You might consider doing what I've done. (Despite being made by EVGA, any nVidia user can register/download/use Precision-X freely... though, not sure if it's available outside of Windows). It probably isn't available for Linux. A lot of Windows software will run on Linux under Wine (Windows emulator) but if it needs hardware access then it usually won't run. Does Precision-X re-flash the ROM on the card? Or do you have to keep Precision-X running while crunching in order to maintain the custom fanspeed curve? The EVGA 660ti I have running in one of my Linux boxes allows the temperature to get to 80C before it gets really serious about boosting fan RPM so I wrote a Python script to put the fan control in manual mode. The script reads the temperature every 5 secs and adjusts the fan speed up or down to keep the GPU at whatever target temperature the user specifies. It uses an ncurses interface which keeps the RAM and CPU overhead quite low compared to a point 'n click GUI. It works very well. The potential problem with the script is that if it should crash/hang, the fan will stay at whatever speed it is at unless the script catches the fault and recovers or is able to restore auto-fan-control before it exits. If bad luck continues and something should then happen to elevate GPU temperature the card will downclock if that mechanism works or if bad luck continues (they say it comes in threes) it will possibly fry. So far the script has run for 60 days without crashing but I would prefer to let the hardware/firmware do the temperature control and perhaps have a script that just monitors the card to verify that the card is doing the proper job. Trouble is, as I said earlier, the stock fan curve installed by EVGA lets the temp hit 80 C before kicking the fan speed up. I want it at 70 C. I have gathered together scraps of info and software from various fora/newsgroups/blogs with which they claim I can reflash the ROM to recurve the temperature function and adjust clock freqs on cards that a lot of people have considered to be "locked". I haven't had time to try it but from feedback it apparently works perfectly. Yes, I am aware that re-flashing ROM can brick the card but if one saves a copy of the original content it's just a matter of flashing the card with the original bin file. It works easiest on Windows but IIUC it's doable on pure Linux too. By pure I mean not a dual-boot Win-Lin system. If anybody wants links just ask and I'll give you what I have so far. ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34195 | Rating: 0 | rate: / Reply Quote | |
@Dagorath | |
ID: 34201 | Rating: 0 | rate: / Reply Quote | |
Actually I meant I would provide links to the various bits of info I have found scattered around the 'net concerning how to reflash BIOS on NVIDIA based cards. I didn't mean the script as I felt that's not a good solution for managing fan speed however, if you are interested then I will provide it. And it does more than just control fan speed. It shows various info (freqs, usage, etc) which might be useful for anybody running without a desktop. I've been playing with gnuplot too and have been thinking the script could save interesting, plotable data to files and provide possibly informative graphs, even collate those with task type history or... | |
ID: 34223 | Rating: 0 | rate: / Reply Quote | |
@Dagorath The script is named gpu_d.py. Get it and a readme in a zip file here. I created a new thread named gpu_d in the cafe if anybody wants to comment on gpu_d. I suggest comments go in that thread rather than this thread. If that thread exceeds more than a few posts and project admins/mods wish, I'll setup something for discussing gpu_d somewhere else, no problem. ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34249 | Rating: 0 | rate: / Reply Quote | |
Thanks Dagorath :) | |
ID: 34260 | Rating: 0 | rate: / Reply Quote | |
It probably isn't available for Linux. A lot of Windows software will run on Linux under Wine (Windows emulator) but if it needs hardware access then it usually won't run. Win only (like pretty much all the other usual tuning tools), needs hardware access (or at least to driver settings), no bios flash and doesn't need to be kept running. I think it really just changes driver settings, as custom settings have to be re-applied after e.g. a driver reset (or reboot). MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 34303 | Rating: 0 | rate: / Reply Quote | |
OK thanks, MrS. | |
ID: 34323 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Process exited with code 159 (0x9f, -97)