Advanced search

Message boards : Graphics cards (GPUs) : Maximum elapsed time exceeded

Author Message
MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9323 - Posted: 5 May 2009 | 9:11:57 UTC

I got this error message in one of my work units today. The card is a new GTS250, so I guess its just not quick enough, or maybe boinc took too long to get to it.

It was sent on the 3rd and its deadline is the 8th of May, so it didn't reach that.

The workunit is here
____________
BOINC blog

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 9331 - Posted: 5 May 2009 | 11:05:33 UTC - in response to Message 9323.
Last modified: 5 May 2009 | 12:32:19 UTC

Any chance it to be related to a recent client upgrade? That error message is not usually related to deadlines, but rather to overly exceeding the expected number of operations (=running time).

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9335 - Posted: 5 May 2009 | 13:45:59 UTC - in response to Message 9331.

Any chance it to be related to a recent client upgrade? That error message is not usually related to deadlines, but rather to overly exceeding the expected number of operations (=running time).


No was done using 6.6.23.

I was using 6.6.25 on 2 other machines. One with identical specs, and the other is an i7 with dual GTX260's. The i7 has been downgraded to 6.6.23. That leaves the other machine (Luke) still running 6.6.25.

It actually only had 2 days (or less) running before it got this. I could understand if it didn't meet the deadline, but this seems like the time limit is set less than 48 hours. My other machines with GTS250's have completed work without any issues.
____________
BOINC blog

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9347 - Posted: 5 May 2009 | 21:10:24 UTC
Last modified: 5 May 2009 | 21:14:13 UTC

The WU itself seems to be normal, as a wingman with a GTX 295 finished it with a normal time per step and a normal elasped time.

Was the WU hanging without any progress (rarely happens) and the newer BOINC clients have some detection mechanism for this, coupled with a less-than-perfect message?

EDIT: taking a look at your host I see it returned its last WU 15:11 on 3rd may, one WU was canceled on the 4th and nothing else happened until the error 9:01 on the 5th of may. Was it running 24/7? If yes, then the task was surely hanging, as it should have been finished after ~12h.

MrS
____________
Scanning for our furry friends since Jan 2002

jboese
Send message
Joined: 30 Jul 08
Posts: 21
Credit: 31,229
RAC: 0
Level

Scientific publications
wat
Message 9352 - Posted: 5 May 2009 | 22:28:50 UTC - in response to Message 9347.

The hanging workunit problem is really hurting this project imho. I know of several people including myself that no longer run this project because we can accept workunits failing but we can't accept workunits hanging and preventing any science for days at a time on our boxes. I understand the GPU issue is much more complex with many different setups but the PS3 situation is unacceptable. I mean the hardware is almost exactly the same and the workunits even hang with the memory stick setup supplied by the project. I don't say these things to grief the project I just hate to see a project turn off newbies to the wonder of DC because they don't have their act together.

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9362 - Posted: 6 May 2009 | 8:20:35 UTC - in response to Message 9347.

The WU itself seems to be normal, as a wingman with a GTX 295 finished it with a normal time per step and a normal elasped time.

Was the WU hanging without any progress (rarely happens) and the newer BOINC clients have some detection mechanism for this, coupled with a less-than-perfect message?

EDIT: taking a look at your host I see it returned its last WU 15:11 on 3rd may, one WU was canceled on the 4th and nothing else happened until the error 9:01 on the 5th of may. Was it running 24/7? If yes, then the task was surely hanging, as it should have been finished after ~12h.

MrS


I think the "never ending wu" bug was fixed in 6.6.23, which is what I run on most of the machines. Only one has 6.6.25 on it at the moment and will be the guinea pig for 6.6.28 later on tonight.

The machines usually run 24/7 as they are dedicated crunchers, unless its warm, in which case i'll turn them off during the day. I have been leaving the ones that have HIVPR wu running so they can complete these wu ASAP.
____________
BOINC blog

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9537 - Posted: 9 May 2009 | 13:31:37 UTC - in response to Message 9362.

@Mark: yes, according to all reports that bug was fixed in 6.6.23. however, there have been rare cases of hanging WUs before. These were not caused by 6.6.20, so they should also happen under 6.6.23. My point was that you may have seen such a case and the the new BOINC client throws an error in this case, instead of running infinitely.

@jboese: are you and the others running 6.6.20, or any other version between 6.6.1 and 6.6.22? As has been just said, these versions introduce such a bug, which is fixed by 6.6.23 (or later) and which is not present in earlier versions.

In this case the project is hurt by following the official recommodation of 6.6.20. With 6.5.0 and earlier versions I only ever had one hanging task since last summer (solved by restarting BOINC).

MrS
____________
Scanning for our furry friends since Jan 2002

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9541 - Posted: 9 May 2009 | 13:54:29 UTC - in response to Message 9537.

@Mark: yes, according to all reports that bug was fixed in 6.6.23. however, there have been rare cases of hanging WUs before. These were not caused by 6.6.20, so they should also happen under 6.6.23. My point was that you may have seen such a case and the the new BOINC client throws an error in this case, instead of running infinitely.

@jboese: are you and the others running 6.6.20, or any other version between 6.6.1 and 6.6.22? As has been just said, these versions introduce such a bug, which is fixed by 6.6.23 (or later) and which is not present in earlier versions.

In this case the project is hurt by following the official recommodation of 6.6.20. With 6.5.0 and earlier versions I only ever had one hanging task since last summer (solved by restarting BOINC).

MrS


I haven't seen any more do it, so must conclude its a one-off.

As for 6.6.20 being the official version I wouldn't be suprised if they make 6.6.28 (or 29 for the Mac) the official one now that they seem to have addressed the "never ending wu" bug and the task thrashing. It would probably help if the project admins suggest this to Dr A.
____________
BOINC blog

Post to thread

Message boards : Graphics cards (GPUs) : Maximum elapsed time exceeded

//