Advanced search

Message boards : Graphics cards (GPUs) : Computer error after a restart

Author Message
zioriga
Send message
Joined: 30 Oct 08
Posts: 46
Credit: 494,132,425
RAC: 3,861,070
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4598 - Posted: 20 Dec 2008 | 6:09:34 UTC

I suspended the GPUGRID project for a while and, when I restarted, the WU crashed with a compute error

12/20/08 06:59:58|GPUGRID|Restarting task kq21298-SH2_US_1-4-40-SH2_US_1510000_1 using acemd version 655
12/20/08 07:00:01|GPUGRID|Computation for task kq21298-SH2_US_1-4-40-SH2_US_1510000_1 finished
12/20/08 07:00:01|GPUGRID|Output file kq21298-SH2_US_1-4-40-SH2_US_1510000_1_1 for task kq21298-SH2_US_1-4-40-SH2_US_1510000_1 absent
12/20/08 07:00:01|GPUGRID|Output file kq21298-SH2_US_1-4-40-SH2_US_1510000_1_2 for task kq21298-SH2_US_1-4-40-SH2_US_1510000_1 absent
12/20/08 07:00:01|GPUGRID|Output file kq21298-SH2_US_1-4-40-SH2_US_1510000_1_3 for task kq21298-SH2_US_1-4-40-SH2_US_1510000_1 absent
12/20/08 07:00:03|GPUGRID|Started upload of kq21298-SH2_US_1-4-40-SH2_US_1510000_1_0

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 387,028,788
RAC: 1,197,795
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4621 - Posted: 20 Dec 2008 | 15:35:06 UTC - in response to Message 4598.
Last modified: 20 Dec 2008 | 15:35:38 UTC

I suspended the GPUGRID project for a while and, when I restarted, the WU crashed with a compute error

12/20/08 06:59:58|GPUGRID|Restarting task kq21298-SH2_US_1-4-40-SH2_US_1510000_1 using acemd version 655
12/20/08 07:00:01|GPUGRID|Computation for task kq21298-SH2_US_1-4-40-SH2_US_1510000_1 finished
12/20/08 07:00:01|GPUGRID|Output file kq21298-SH2_US_1-4-40-SH2_US_1510000_1_1 for task kq21298-SH2_US_1-4-40-SH2_US_1510000_1 absent
12/20/08 07:00:01|GPUGRID|Output file kq21298-SH2_US_1-4-40-SH2_US_1510000_1_2 for task kq21298-SH2_US_1-4-40-SH2_US_1510000_1 absent
12/20/08 07:00:01|GPUGRID|Output file kq21298-SH2_US_1-4-40-SH2_US_1510000_1_3 for task kq21298-SH2_US_1-4-40-SH2_US_1510000_1 absent
12/20/08 07:00:03|GPUGRID|Started upload of kq21298-SH2_US_1-4-40-SH2_US_1510000_1_0


I noticed is that your video card changed during the most recent failed WU Task 167292. It switches from being identified an 8600GT to a 9800GT. Did you switch video cards in the middle of the WU? I can't be sure but that seems like a good reason the WU might have crapped out on you.

# Device 0: "GeForce 8600 GT"
# Clock rate: 1188000 kilohertz
# Number of multiprocessors: 4
# Number of cores: 32
# Using CUDA device 0
# Device 0: "GeForce 9800 GT"
# Clock rate: 1512000 kilohertz
# Number of multiprocessors: 14
# Number of cores: 112

zioriga
Send message
Joined: 30 Oct 08
Posts: 46
Credit: 494,132,425
RAC: 3,861,070
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4693 - Posted: 21 Dec 2008 | 22:11:14 UTC - in response to Message 4621.

K1atOdessa, You're right.
I switched from 8600GT to 9800GT in the middle of a WU.
I promise, I'll never do it again !!!!

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 387,028,788
RAC: 1,197,795
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4697 - Posted: 21 Dec 2008 | 22:42:14 UTC - in response to Message 4693.

K1atOdessa, You're right.
I switched from 8600GT to 9800GT in the middle of a WU.
I promise, I'll never do it again !!!!


LOL. It's always good to set BOINC to no new tasks and finishing everything up before adding or changing a video card. Takes another variable out of the equation.

Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 4775 - Posted: 23 Dec 2008 | 4:31:34 UTC

Just got an error in this WU; 174101

22/12/2008 11:24:05 PM|GPUGRID|Computation for task WM20233-SH2_US_1-7-40-SH2_US_1210000_0 finished
22/12/2008 11:24:05 PM|GPUGRID|Output file WM20233-SH2_US_1-7-40-SH2_US_1210000_0_1 for task WM20233-SH2_US_1-7-40-SH2_US_1210000_0 absent
22/12/2008 11:24:05 PM|GPUGRID|Output file WM20233-SH2_US_1-7-40-SH2_US_1210000_0_2 for task WM20233-SH2_US_1-7-40-SH2_US_1210000_0 absent
22/12/2008 11:24:05 PM|GPUGRID|Output file WM20233-SH2_US_1-7-40-SH2_US_1210000_0_3 for task WM20233-SH2_US_1-7-40-SH2_US_1210000_0 absent

caused a BSOD under the dxgkrnl. I have the memory dump file but it's 568Mb.

Pat

Profile pschoefer
Send message
Joined: 21 Sep 08
Posts: 3
Credit: 26,162,443
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 4870 - Posted: 26 Dec 2008 | 12:53:52 UTC

Now I have this problem on Host 9545, a C2Q Q9450 with GeForce 9800GT running Win XP x64. I did not change anything, not even a reboot, but all WUs have crashed since this morning. Reset did not help. :(

26.12.2008 13:41:58|GPUGRID|Starting task CZG9573-SH2_US-7-40-SH2_US200000_1 using acemd version 655
26.12.2008 13:41:59|GPUGRID|Computation for task CZG9573-SH2_US-7-40-SH2_US200000_1 finished
26.12.2008 13:41:59|GPUGRID|Output file CZG9573-SH2_US-7-40-SH2_US200000_1_1 for task CZG9573-SH2_US-7-40-SH2_US200000_1 absent
26.12.2008 13:41:59|GPUGRID|Output file CZG9573-SH2_US-7-40-SH2_US200000_1_2 for task CZG9573-SH2_US-7-40-SH2_US200000_1 absent
26.12.2008 13:41:59|GPUGRID|Output file CZG9573-SH2_US-7-40-SH2_US200000_1_3 for task CZG9573-SH2_US-7-40-SH2_US200000_1 absent
26.12.2008 13:42:01|GPUGRID|Started upload of CZG9573-SH2_US-7-40-SH2_US200000_1_0
26.12.2008 13:42:03|GPUGRID|Finished upload of CZG9573-SH2_US-7-40-SH2_US200000_1_0

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4901 - Posted: 26 Dec 2008 | 23:32:27 UTC - in response to Message 4870.

The task output says you get the error
"Cuda error in file 'deviceQuery.cu' in line 59 : out of memory."

which means some app reserved so much GPU memory that there's not enough left for GPU-Grid. That's a common error on 64 Bit win after a certain runtime, but your situation is different (already tried the reboot).

I'd do 2 things: shut down, remove the power cord for >10 mins and try again. And I'd install the 180.84 driver, if not already done, which fixes the memory leak.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile pschoefer
Send message
Joined: 21 Sep 08
Posts: 3
Credit: 26,162,443
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 4932 - Posted: 27 Dec 2008 | 11:20:13 UTC - in response to Message 4901.

Thank you. It's running again after driver update, no problems so far. :)

Post to thread

Message boards : Graphics cards (GPUs) : Computer error after a restart

//