Author |
Message |
Dingo Send message
Joined: 1 Nov 07 Posts: 20 Credit: 128,376,317 RAC: 0 Level
Scientific publications
|
I did the driver update for Nvidia to 431.6 and there is an error in the driver code that stops me from running GPU Grid as all the work since then has this error. It is on my windows machine with my 1080Ti.
I can run Primegrid on the machine after the update so looks like a project code issue ???
This is the machine: https://www.gpugrid.net/results.php?hostid=453402
At the very end of processing:
Name e9s120_e3s89p1f137-PABLO_V4_UCB_p27_sj403_no_salt_IDP-0-2-RND6771_0
Workunit 16678301
Created 28 Jul 2019 | 19:10:51 UTC
Sent 28 Jul 2019 | 20:27:33 UTC
Received 29 Jul 2019 | 2:37:26 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -55 (0xffffffffffffffc9) Unknown error number
Computer ID 453402
Report deadline 2 Aug 2019 | 20:27:33 UTC
Run time 21,941.82
CPU time 1,907.74
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v9.22 (cuda80)
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -55 (0xffffffc9)</message>
<stderr_txt>
# GPU [GeForce GTX 1080 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 1080 Ti
# ECC : Disabled
# Global mem : 11264MB
# Capability : 6.1
# PCI ID : 0000:0A:00.0
# Device clock : 1645MHz
# Memory clock : 5505MHz
# Memory width : 352bit
# Driver version : r431_31 : 43136
# GPU 0 : 71C
# GPU [GeForce GTX 1080 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 1080 Ti
# ECC : Disabled
# Global mem : 11264MB
# Capability : 6.1
# PCI ID : 0000:0A:00.0
# Device clock : 1645MHz
# Memory clock : 5505MHz
# Memory width : 352bit
# Driver version : r431_31 : 43136
# GPU 0 : 68C
# GPU 0 : 69C
# GPU 0 : 70C
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1965.
# SWAN swan_assert 0
____________
Proud Founder and member of
Have a look at my WebCam |
|
|
Erich56Send message
Joined: 1 Jan 15 Posts: 1132 Credit: 10,205,482,676 RAC: 29,855,510 Level
Scientific publications
|
what seems strange to me is:
Run time 21,941.82
CPU time 1,907.74 |
|
|
rod4x4Send message
Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level
Scientific publications
|
There are 4 recent errors reported for this host. 3 errors with v431.36 and 1 error with v431.60
v431.36 errors
1 task that failed was from a batch with a 68% failure rate, so this failure can be attributed to the bad batch,
2 tasks failed at exactly the same time with this error:
Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1965
so I suspect the attached link explains this issue...
https://www.gpugrid.net/forum_thread.php?id=4652#48209
v431.60 error
An error appears to be reported (Access violation : progress made, try to restart), and then aborted by user. What was the issue that lead you to abort the task?
There are other hosts successfully using v431.60 so I don't think the version is the issue, perhaps try another task to see if further issues are experienced.
EDIT: This error could also be attributed to a bad batch, see this thread:
https://www.gpugrid.net/forum_thread.php?id=4634#48021 |
|
|
rod4x4Send message
Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level
Scientific publications
|
what seems strange to me is:
Run time 21,941.82
CPU time 1,907.74
If SWAN_SYNC is not enabled, this can be normal, especially on a fast processor (Ryzen 7 1800X) |
|
|
Dingo Send message
Joined: 1 Nov 07 Posts: 20 Credit: 128,376,317 RAC: 0 Level
Scientific publications
|
OK I will try another task and see what happens. This is th task that is running now:
https://www.gpugrid.net/workunit.php?wuid=16682627 |
|
|
Dingo Send message
Joined: 1 Nov 07 Posts: 20 Credit: 128,376,317 RAC: 0 Level
Scientific publications
|
OK all is fine now. Must have been a problem of the update happening while GPUGRID was running ??
|
|
|
|
OK all is fine now. Must have been a problem of the update happening while GPUGRID was running ?? Exactly.
|
|
|