Advanced search

Message boards : Graphics cards (GPUs) : Problems with ACEM 6.71 ??

Author Message
zioriga
Send message
Joined: 30 Oct 08
Posts: 46
Credit: 494,132,425
RAC: 3,861,070
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13012 - Posted: 4 Oct 2009 | 15:33:55 UTC

It seems that with the new ACEM 6.71 all my Wu terminate with Compute Error after about 15 hours of elapsed.
1324260
1326713
1338624

Anyone else has the same problem ??

9800GT. CUDA 190.62 Win XP 64b, BM 6.10.11 64b

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 755,434,080
RAC: 210,052
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13031 - Posted: 6 Oct 2009 | 1:52:06 UTC - in response to Message 13012.
Last modified: 6 Oct 2009 | 2:01:10 UTC

I have a similar error, also with a 9800 GT. What I've seen so far from the wingmen on these workunits suggests that those with 200 series GPU boards are not affected, but at least one more with a 9800 GT is.

I have had about 5 GPUGRID workunits with that problem so far today, all with less than 6 hours elapsed. I've previously had this problem under BOINC 6.6.36 instead of the 6.10.3 I'm using now.

The 64-bit Windows version does not seem to make a difference; little evidence yet on 32-bit versions and I haven't seen any for non-Windows operating systems. At least some workunits of the pYEEl series and the TRYP series are affected, but little sign that earlier series are affected. No clear evidence that those with the 185.20 driver are affected, but those with either the 190.38 driver or the 190.62 driver definitely are.

Another problem I've seen lately suggests that it's worth asking if anyone else with this problem while using a 9800 GT on a machine that also participates in any BOINC projects for which the graphics cover the full screen at the same time. So far I've seen this other problem while showing graphics from either a Rosetta Beta workunit from Rosetta@home, or a QMC@home workunit. I also participate in Docking@home, but have not noticed their full-screen graphics being involved yet.

Anyone else ready to report more on just what's affected?

Vista SP2 64-bit
BOINC 6.10.3
9800 GT
CUDA 190.38

Anyone else with this problem want to try a 185.* CUDA driver again to see if it fixes this problem without adding new problems?

Darth_Vader
Send message
Joined: 3 Aug 09
Posts: 1
Credit: 45,205
RAC: 0
Level

Scientific publications
wat
Message 13032 - Posted: 6 Oct 2009 | 2:06:13 UTC - in response to Message 13031.

Same exact problem.

XP 32-bit
Boinc 6.6.36
9800GT
CUDA 190.38

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 13038 - Posted: 6 Oct 2009 | 5:23:14 UTC

Robert,

Well, I don't use screen savers so that should not be part of this problem. I reported my last crashing tasks (1341055) in another thread. And that task crashed on a GTX295 ...

zpm
Avatar
Send message
Joined: 2 Mar 09
Posts: 159
Credit: 13,639,818
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 13051 - Posted: 6 Oct 2009 | 14:13:31 UTC - in response to Message 13038.

i had a stuck wu for 3:30 hrs and minutes... i hit abort and moved on..

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 755,434,080
RAC: 210,052
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13060 - Posted: 7 Oct 2009 | 4:57:55 UTC

A guess on one problem affecting 6.71, but not responsible for all the problems mentioned in this thread:

It cannot handle the following situation well, at least for the recent workunits:

1. It's running on a computer where the only graphics board available is one 9800 GT, and therefore any graphics will run on the same Nvidia board it's running on.

2. The same computer is currently running a CPU workunit from a different BOINC project, and that workunit includes graphics that cover the entire screen.

3. No effort has been made to disable the screensaver included in at least some recent BOINC versions, such as 6.10.3, which often select one of the workunits in progress as the one to display graphics for.

4. That screensaver has been displaying graphics from a workunit where the graphics cover the entire screen, long enough that it decides to move it elsewhere on the screen, but finds that there's no free space on the screen to move it towards, and as a result freezes those graphics in a way that won't go away if the user resumes keyboard or mouse use.

As a result, I'd suggest seven items:

1. Any thread recommending that users switch to a new version of BOINC should include some reference to instructions on how to disable that screensaver for that BOINC version. Such a reference should be added to threads on any new versions of acemd affected by this problem also.

2. Alpha testers with a 9800 GT should be asked to test if that problem applies also affects any new acemd versions, and whether that problem can happen even when no GPUGRID workunit is in progress or when some other Nvidia board is the only graphics board available.

3. Alpha testers should also check if this problem continues to occur for new BOINC versions and new acemd versions.

4. Some users with both a 9800 GT and a way of doing remote logins from another computer without those remote logins affecting GPU use should be asked to check if this problem freezes all workunits in progress. I've already seen evidence that it freezes the workunit with full-screen graphics.

5. After enough more is known about this problem, the GPUGRID project may want to consider sending some email giving a summary of the problem to all participants who have recently finished a workunit that used a 9800 GT.

6. The GPUGRID project should consider a new version of acemd that when faced with situations similar enough to these, should automatically pause GPU use while running on a 9800 GT if the screensaver included in recent versions of BOINC starts displaying anything, and keep it paused as long as that screensaver is running and actually displaying graphics from some other BOINC project.

7. The group responsible for new BOINC versions should be asked to include a new feature in that screensaver: Try to detect whether the monitor has been turned off, and if so, limit anything it tries to display to a fully black screen.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 755,434,080
RAC: 210,052
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13061 - Posted: 7 Oct 2009 | 5:06:48 UTC - in response to Message 13038.
Last modified: 7 Oct 2009 | 5:38:27 UTC

Robert,

Well, I don't use screen savers so that should not be part of this problem. I reported my last crashing tasks (1341055) in another thread. And that task crashed on a GTX295 ...


Paul,

You could have found a different problem from the one I'm discussing.

Could you at least check if the latest BOINC version you installed has forgotten your decision not to use any screensaver? I suspect that installing 6.10.3 made it forget such a decision for me, and I've lost the instructions on how to disable use of any screensaver.

Also, if such a problem occurs again, could you check if it occured while you had any workunit from a BOINC project that uses full screen graphics (I believe I've listed three of them earlier in this thread)?

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 13065 - Posted: 7 Oct 2009 | 7:46:59 UTC - in response to Message 13061.

Robert,

Well, I don't use screen savers so that should not be part of this problem. I reported my last crashing tasks (1341055) in another thread. And that task crashed on a GTX295 ...


Paul,

You could have found a different problem from the one I'm discussing.

Could you at least check if the latest BOINC version you installed has forgotten your decision not to use any screensaver? I suspect that installing 6.10.3 made it forget such a decision for me, and I've lost the instructions on how to disable use of any screensaver.

Also, if such a problem occurs again, could you check if it occured while you had any workunit from a BOINC project that uses full screen graphics (I believe I've listed three of them earlier in this thread)?

Right click the desktop, on vista select "Personalize" on XP select Properties... XP click the screen saver tab and look towards the bottom and see that it says "None", if it says BOINC, change it to none. In Vista you would select Screen Saver from the list, and do the same.

I just checked all my systems and none of them have SS turned on ... I was pretty sure they hadn't cause I use a KVM and switch among them at least once a day to check to see if there is anything locally at issue (an update needed for example, no work, stuck work, etc.)

In general, I don't pop up the graphics except on rare occasions and then usually only on the Mac which is not using the GPU...

As to projects, I am running only a few at the moment, Anansi, Collatz (ATI GPU system), FreeHAL, GPU Grid (Nvidia systems on GPU), Milky Way (ATI and one Nvidia in alternation with GPU Grid), QCN (2 systems, one the Mac) and WCG on all systems as the main CPU project (Still a few blue badges to go though I am close to another one)...

Not sure if this helps, but, sometimes the slightest clue is hidden right in front of us ...

Profile Misfit
Avatar
Send message
Joined: 23 May 08
Posts: 33
Credit: 610,551,356
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13075 - Posted: 7 Oct 2009 | 19:52:10 UTC

I had video crashes while playing Dawn of Magic 2. With popups from the systray saying some (nVidia) dll crashed and restarted successfully. Both with the brand new driver, and previous driver. Ran the past 2 days just fine. No screensaver use here. Dual CPUs had MW crunching. EVGA 9800GT, Vista 32, BOINC 6.6.36.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 755,434,080
RAC: 210,052
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13084 - Posted: 9 Oct 2009 | 4:05:35 UTC

A possible acemd feature that might help the problem I described:

The workunit should include enough information to identify areas of GPU memory that the GPU part of the workunit should never write to during normal operation, usually because they hold the program the GPU should run. Acemd should occasionally read a random selection of these locations, and if they have changed anyway, record this and pause trying to use the GPU at all for a few minutes, then restart from a checkpoint.

When writing a checkpoint, it should also do this, and if such a change is detected, mark that checkpoint as unusable since a previous one needs to be used instead.

The problem I described could easily be because a GPUGRID workunit and some other workunit's graphics decided to use the same part of the GPU memory, since I have only one graphics card in that machine, and did not have graphics disabled.

Paul, I've just followed your instructions for disabling the screensaver, so I'll check if this workaround handles the problem.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 13085 - Posted: 9 Oct 2009 | 4:37:27 UTC - in response to Message 13084.

Paul, I've just followed your instructions for disabling the screensaver, so I'll check if this workaround handles the problem.

I hope it works ...

Sometimes I find that BOINC has somehow changed my settings and farbled things up and installed the BOINC screen saver ... Always a PITA to find that it has started and maybe crashed some tasks ...

Please let us know if it does ... successes where someone solved their issue fuels the soul ... :)

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13098 - Posted: 10 Oct 2009 | 1:11:15 UTC - in response to Message 13085.

Paul, I've just followed your instructions for disabling the screensaver, so I'll check if this workaround handles the problem.

I hope it works ...

Sometimes I find that BOINC has somehow changed my settings and farbled things up and installed the BOINC screen saver ... Always a PITA to find that it has started and maybe crashed some tasks ...

Please let us know if it does ... successes where someone solved their issue fuels the soul ... :)


You can also tell BOINC not to use the screen saver when you install it. There is a checkbox on the installation screen which you can clear and then it won't use the screen saver.

Personally I use one of the windows screen savers for 5 mins before powering the monitor off after 10 mins. Apart from saving a little power it saves the cpu from having to draw pictures on the screen, when it could be doing maths :-)
____________
BOINC blog

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 755,434,080
RAC: 210,052
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13130 - Posted: 11 Oct 2009 | 0:42:22 UTC - in response to Message 13085.

Paul, I've just followed your instructions for disabling the screensaver, so I'll check if this workaround handles the problem.

I hope it works ...

Sometimes I find that BOINC has somehow changed my settings and farbled things up and installed the BOINC screen saver ... Always a PITA to find that it has started and maybe crashed some tasks ...

Please let us know if it does ... successes where someone solved their issue fuels the soul ... :)


It's worked for me. The screensaver is disabled, and I've stopped having screen freezes or a significant number of failed workunits.

PeteS
Send message
Joined: 1 Jan 09
Posts: 7
Credit: 3,602,175
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 13139 - Posted: 11 Oct 2009 | 8:16:07 UTC - in response to Message 13130.

I have a lot of wu's failing after several hours of crunching with the error: cannot open file "restart.coor"

Host: http://www.gpugrid.net/show_host_detail.php?hostid=43388
Task example: http://www.gpugrid.net/result.php?resultid=1368047

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13140 - Posted: 11 Oct 2009 | 9:51:54 UTC - in response to Message 13139.
Last modified: 11 Oct 2009 | 9:53:50 UTC

I have a lot of wu's failing after several hours of crunching with the error: cannot open file "restart.coor"

Host: http://www.gpugrid.net/show_host_detail.php?hostid=43388
Task example: http://www.gpugrid.net/result.php?resultid=1368047


The 'MDIO ERROR: cannot open file "restart.coor"' happens at the beginning of all work units. Its looking for a restart file, which if it isn't there means its a new wu.

The actual error you got was "Incorrect function. (0x1) - exit code 1 (0x1)", which is a fairly generic message. See this message thread for the usual answers why things fail.

One thing that springs to mind is there was a bug reported a while back with Windows 7 not being able to use the 2nd graphics card for cuda tasks. I notice that the error is on device 1 which would be the 2nd card.
____________
BOINC blog

Post to thread

Message boards : Graphics cards (GPUs) : Problems with ACEM 6.71 ??

//