Advanced search

Message boards : Number crunching : Must set rsc_memory_bound correctly

Author Message
Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36032 - Posted: 31 Mar 2014 | 20:06:21 UTC
Last modified: 31 Mar 2014 | 20:06:41 UTC

GPUGrid Team:

You need to change your work unit parameters, to properly set <rsc_memory_bound> correctly. BOINC 7.3.14 alpha (and potentially future versions also) will read that value, and compare it to the Working Set size, and will auto-abort the work unit if it exceeds the bound.

As of right now, I cannot do any GPUGrid work -- they all error because of your invalid parameter setting. They all immediately abort, saying:
working set size > workunit.rsc_memory_bound: 194.14MB > 95.37MB

Your setting of
<rsc_memory_bound>100000000.000000</rsc_memory_bound>
... is 95.37MB, which is too low.

Could you please promptly fix this?

Regards,
Jacob Klein

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 36033 - Posted: 31 Mar 2014 | 20:41:31 UTC - in response to Message 36032.

Jacob,

Just to be sure I understand: are you saying that this is new behaviour with release 7.3.14?

Matt

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36034 - Posted: 31 Mar 2014 | 20:47:43 UTC - in response to Message 36033.
Last modified: 31 Mar 2014 | 20:55:14 UTC

That is correct, it is new behavior to the 7.3.14 alpha release.

Previously, the client would compare the running Working Set against the user-specified BOINC RAM limit, to determine whether a task needed to be suspended.

But NOW, additionally, in 7.3.14+, the running Working Set value is also compared against wu.rsc_memory_bound, and the work unit is immediately auto-aborted if the bound is exceeded, with a message such as:
working set size > workunit.rsc_memory_bound: 194.14MB > 95.37MB

- Jacob

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 36035 - Posted: 31 Mar 2014 | 20:54:46 UTC - in response to Message 36034.
Last modified: 31 Mar 2014 | 21:13:52 UTC

Hi Jacob, thanks for noting this in time. It affected the short WUs, and should be fixed for the forthcoming ones.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36036 - Posted: 31 Mar 2014 | 20:56:38 UTC - in response to Message 36035.

Toni,

What do you mean by that? David thinks the change is correct, and will include it in a public release depending on how much trouble it causes.

In the meantime, I'd like to continue to do work for your project.
Would you mind fixing the input parameters for your work units?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 36037 - Posted: 31 Mar 2014 | 21:05:38 UTC - in response to Message 36034.

Ok, thanks for the heads-up, Jacob. I've amended the limit to 300MB but can't modify any of the WUs that are already in the system, alas.

Matt

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36038 - Posted: 31 Mar 2014 | 21:09:05 UTC - in response to Message 36037.

Thank you Matt. GPUGrid isn't the only project that will have to make wu parameter changes, if we do decide to keep this change.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36041 - Posted: 31 Mar 2014 | 23:16:06 UTC

http://boinc.berkeley.edu/dev/forum_thread.php?id=8378&postid=53444#53444

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36043 - Posted: 1 Apr 2014 | 2:05:27 UTC
Last modified: 1 Apr 2014 | 2:05:59 UTC

It looks like this change is being reverted for now, per David's email below.


> Date: Mon, 31 Mar 2014 18:53:33 -0700
> From: [email protected]
> To: [email protected]
> Subject: Re: [boinc_alpha] 7.3.14 - Heads up - Memory bound enforcement
>
> On further thought, I'm going to change things back to the way they were, namely
>
> 1) workunit.rsc_memory_bound is used only by the server;
> it won't send a job if rsc_memory_bound > host's available RAM
> 2) the client aborts a job if working set size > host's available RAM
> 3) the client will run a set of jobs only if the sum of their WSSs
> fits in available RAM
> (i.e. if a job's WSS is close to all available RAM,
> it would run that job and nothing else)
>
> The reason for not aborting jobs when WSS > rsc_memory_bound is that
> it requires projects to come up with very accurate estimates of RAM usage,
> which I don't think is feasible in general.
> Also, it will lead to lots of aborted jobs, which is bad for volunteer morale.
>
> -- David

Post to thread

Message boards : Number crunching : Must set rsc_memory_bound correctly

//