Advanced search

Message boards : Number crunching : ACEM Beta 6.47 bug

Author Message
zioriga
Send message
Joined: 30 Oct 08
Posts: 46
Credit: 502,232,425
RAC: 3,757,737
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26820 - Posted: 8 Sep 2012 | 4:51:55 UTC

After some try and error I've found a bug.

If you, for example, set to run GPU "Only after computer has been idle for xx minutes" every ACEM Beta WU may crash with computation error.

This happens if, and only if, the computer starts a new WU and within an interval of time (I haven't yet found which one - but I think a short interval - some minutes) the WU is stopped because the user is working again.
Next time the computer restarts working on the previous WU, this WU crashes with Computation Error.
In other words there is a starting interval af time where the WU must run uninterrupted.

My configuration: Win XP 64b - NVidia GTX 680 - BOINC 7.0.33

BTW, if a previous post (http://www.gpugrid.net/forum_thread.php?id=3125) I wrote about receiving ACEM Beta WU even if I set not to run them, but I continue to receive only ACEM beta[/url]

werdwerdus
Send message
Joined: 15 Apr 10
Posts: 123
Credit: 1,004,473,861
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26821 - Posted: 8 Sep 2012 | 6:14:30 UTC

what is your driver version? there is a bug in some that errors the work units if the monitor is suspended while a new work unit starts

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26824 - Posted: 8 Sep 2012 | 10:56:13 UTC - in response to Message 26820.

BTW, if a previous post (http://www.gpugrid.net/forum_thread.php?id=3125) I wrote about receiving ACEM Beta WU even if I set not to run them, but I continue to receive only ACEM beta[/url]
You need to turn off the "Run test applications" (3rd item in preferences) AND also unselect ACEMD beta from the "Run only the selected applications"

what is your driver version? there is a bug in some that errors the work units if the monitor is suspended while a new work unit starts
301.42

You can see this if you click on zioriga's nick, then click on Computers, and or each computer click on Details (this works for everyone who has set "Should GPUGRID show your computers on its web site?" = Yes
____________
Thanks - Steve

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26826 - Posted: 8 Sep 2012 | 11:19:34 UTC - in response to Message 26824.
Last modified: 8 Sep 2012 | 11:25:46 UTC

I'm seeing this,

    acemd.2562.cuda42 - Application Error

    The exception unknown software exception (0x40000015) occurred in the application at location 0x00413c9b



It's only for Beta tasks and I think it just happens when I log onto a locked system. Of note is the exceptionally low CPU usage,

5821574 3697504 125945 6 Sep 2012 | 4:38:08 UTC 7 Sep 2012 | 21:31:07 UTC Error while computing 8,783.11 1.19 --- ACEMD beta version v6.47 (cuda42)

When I see these sorts of error I normally exit Boinc (everything), wait a minute, close the Windows error message and start Boinc up again. This old trick normally prevents the task from failing. However in this case I get the same error message when Boinc starts up again (after about 30sec). As long as I don't close the error message the tasks keep running. There is nothing of note in the event logs.
I engaged the debug on this NOELIA task,

    SWAN : FATAL : Cuda driver error 2 in file 'swanlibnv2.cpp' in line 639.
    Assertion failed: a, file swanlibnv2.cpp, line 59



I'll just run some short tasks for a while.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

zioriga
Send message
Joined: 30 Oct 08
Posts: 46
Credit: 502,232,425
RAC: 3,757,737
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26827 - Posted: 8 Sep 2012 | 12:26:53 UTC

@Snow Crash

Ok thanks I turned off "Run test application"

but this is a little disappointing!!
If I turn off "run ACEM Beta" this means not to run ACEM Beta !!!

I think the webmaster shoud do some work to eliminate this misunderstanding

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26828 - Posted: 8 Sep 2012 | 12:42:23 UTC - in response to Message 26827.

GPUGrid uses a Boinc server and the website is a Boinc template. If GPUGrid makes changes to the server and or site, these would probably have to be undone and redone when upgrading the server/site. There have been plenty of GPUGrid specific adaptations and improvements already. Making more changes requires more (non-research) time and would slow down and complicate future server/site updates.

For the present implementation of Beta's the 'Run only the selected applications ACEMD beta: no' doesn't actually do anything.
Only the 'Run test applications?' does something.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : Number crunching : ACEM Beta 6.47 bug

//