Author |
Message |
RobertNSend message
Joined: 18 Nov 09 Posts: 7 Credit: 52,996,450 RAC: 0 Level
Scientific publications
|
Hey hey,
I love the new badge system and so but I am quite worried about something else. Lately an increasing amount of jobs ended with computing errors. At the moment it happens so often that I think I can better put my GPU on another project.
I dunno what causes it. I have the latest beta drivers from Nvidia installed here (290.53). It solved crashing of the display driver (caused by the hardware acceleration of FireFox I think). When the display driver crashes the GPUGRID task also crashes (have not seen an exception yet). Besides that, still too often the GPUGRID workunits crash for other reasons like: http://www.gpugrid.net/result.php?resultid=4879675
An incorrect function, how is that possible?
I would very much appreciate some effort into getting the workunits more stable.
Regards, iconized. |
|
|
|
Are you overclocking the GPU? |
|
|
nenymSend message
Joined: 31 Mar 09 Posts: 137 Credit: 1,308,230,581 RAC: 0 Level
Scientific publications
|
Try to change the core clock to 880 - 890 MHz. I had the same problem with factory OCed GTX560Ti. |
|
|
RobertNSend message
Joined: 18 Nov 09 Posts: 7 Credit: 52,996,450 RAC: 0 Level
Scientific publications
|
Yes, factory OC of 900 MHz, quite a bit higher than 822 MHz (the norm). I Will lower it. Thanks for the replies! |
|
|
|
At the moment both my cards can't even finish a WU on stock speed while the first few wu's were done with a reasonable OC.
FI :
Stderr output
<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
Het systeem kan het opgegeven pad niet vinden. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 460"
# Clock rate: 1.53 GHz
# Total amount of global memory: 804847616 bytes
# Number of multiprocessors: 7
# Number of cores: 56
MDIO: cannot open file "restart.coor"
SWAN: FATAL : swanMemcpyDtoH failed
Assertion failed: 0, file swanlib_nv.c, line 390
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
</stderr_txt>
]]>
and
Stderr output
<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 460"
# Clock rate: 1.84 GHz
# Total amount of global memory: 1073283072 bytes
# Number of multiprocessors: 7
# Number of cores: 56
SWAN: Using synchronization method 0
MDIO: cannot open file "restart.coor"
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 460"
# Clock rate: 1.80 GHz
# Total amount of global memory: 1073283072 bytes
# Number of multiprocessors: 7
# Number of cores: 56
SWAN: Using synchronization method 0
MDIO: cannot open file "restart.coor"
ERROR: # Energies have become nan
called boinc_finish
</stderr_txt>
]]> |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
When you start getting errors you should make some observations, temps of GPU, CPU, board, fan speeds, task failure types, system usage at time of failure.
There are several generic things you can do,
Restart the system (stops system related runaway errors),
Increase fan speed / improve ventilation (reduces temps),
Free up a CPU core/thread (stops some heartbeat issues),
Reduce CPU clocks if the CPU is overclocked (reduces system temperature/motherboard and component overheating issues, especially chipset),
Reduce GPU clocks (start by trying to reduce the memory speed, then move onto the GPU if need be, but you shouldn't have to go below 10%)
Rollback, reinstall or upgrade drivers,
Increase GPU voltage very slightly.
If none of these work, there's more,
Clean the GPU and system,
Reset the Bios,
Re-seat the GPU (take it out, reboot, power down, re-seat the GPU),
Restore or reinstall the operating system.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Well, I came from MW@H, with a GTX460FTW/920 and a GTX460SC/835 which also worked fine on S@H, because they ran out off wu's .
After a few wu's trouble arose but had done nothing to settings or whatever.
No probs with temps, if temps go up I take my compressor and clean the whole lot.
So back to stock speed, no results.
Tried SWAN_SYNC=0 and freed a core but than I saw "Suspend work when non-BOINC CPU usage is above 25%" which is a bit strange if you set gpu to work always.
This "Suspend" comes back irregularly without me having changed a thing.
Is getting extremely annoying.
Just getting a bit tired of trying everything again and again.
Perhaps later I'll try a clean install of everything and replacing a AM2+ mobo by a Asrock 870 Xtreme, 4GB DDR2 by 8 GB DRR3 anmem and a HD. |
|
|
RobertNSend message
Joined: 18 Nov 09 Posts: 7 Credit: 52,996,450 RAC: 0 Level
Scientific publications
|
My factory OC-ed 560 Ti normally:
core clock: 900 MHz
shader clock: 1800 MHz
memory clock: 2004 MHz
I have been running a bit lower for a couple of days now:
core clock: 800 MHz
shader clock: 1600 MHz
memory clock: 1800 MHz
These are all MSI Afterburner numbers so there might be some rounding errors.
I also ran a Video memory Stress test (vmt) with these settings and no problems.
I keep getting errors and all the latest errors are with the NATHAN units:
http://www.gpugrid.net/results.php?hostid=111996
All latest units producing errors gave this error:
Incorrect function. (0x1) - exit code 1 (0x1)
I don't have problems with other GPU projects (PrimeGrid, Mfaktc for GIMPS) with the factory OC.
So perhaps it is a problem with those work units? |
|
|
|
A clean upgrade to xp 64, installing a fresh boinc, 258.96 and the card on stock settings didn't result in no more errors.
Tried it before and got errors then too. It looks like GPUGRID is not for me.
Fyi : 23 nathan's and 1 gianni. |
|
|
|
Sjips, my sgs is stuttering |
|
|
JSTLSend message
Joined: 21 Dec 11 Posts: 2 Credit: 699,044,050 RAC: 3,127,474 Level
Scientific publications
|
I have the exact same issue (Assertion failed: 0, file swanlib_nv.c, line 390
) since I updated to Nvidia's beta drivers (295.51) Everything was working nicely prior to that.
I don't believe that's a coincidence. |
|
|
RobertNSend message
Joined: 18 Nov 09 Posts: 7 Credit: 52,996,450 RAC: 0 Level
Scientific publications
|
This is also funny:
http://www.gpugrid.net/workunit.php?wuid=3127645
http://www.gpugrid.net/workunit.php?wuid=3127506
But not correlated. |
|
|
|
I have the same problem? with brand new GPU, hardware and distro
This was my first WU, but is crunching well at Einstein@Home
acemd2_6.14_x86_64-pc-linux-gnu__cuda31: swanlib_nv.c:388: error: Assertion `0' failed.
One weird thing I noticed is that the screen had a "scrambled" image.
I restated... let's see it tomorrow.
|
|
|