Advanced search

Message boards : News : New CPU Application for testing

Author Message
Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 37343 - Posted: 21 Jul 2014 | 21:07:26 UTC

Hi,

There's a new CPU app available for Linux clients. A few WUs are out now, with some more to come after I've received the first results back.

The app is multithreaded, I think the default behaviour of the BOINC client is to allocate all cores to it.

Please report any observations here.

Matt

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,653,598,843
RAC: 17,535,523
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37345 - Posted: 22 Jul 2014 | 0:40:43 UTC

Hi Matt,

So far, all tasks ending in error. Some stop immediately, some run for a while.

Here's my main question. Over at Milkyway@home, they are able to control the threads assigned to a multi-thread task using an app_config.xml file. I put together a similar app_config.xml file for the cpumd tasks and tried it on a few tasks.

My BOINC client has 10 threads assigned. I have an app_config.xml set up to limit the threads assigned to cpumd tasks to 6 threads. In BOINC Manager, the task shows that it is using 6 CPU's. However, the stderr.txt shows that it is using 10 openmp threads and system usage indicates that it is using all 10 threads.

My app_config:

<app_config>
<app>
<name>acemdlong</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemdbeta</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemdshort</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>1.5</cpu_usage>
</gpu_versions>
</app>

<app>
<name>android</name>
<max_concurrent>4</max_concurrent>
</app>
<app_version>
<app_name>cpumd</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>6</avg_ncpus>
</app_version>
</app_config>


Can you see anything that needs to be adjusted in my app_config? BOINC Manager is not giving me any error messages when it reads the app_config file.

Sure would be nice if this would work here so we could leave some threads open to support GPU tasks.

Thanks for all the help.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,923,227,372
RAC: 18,784,623
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37346 - Posted: 22 Jul 2014 | 8:47:32 UTC - in response to Message 37345.

Refer to the Application configuration documentation.

For most multi-threaded applications you need

<cmdline>--nthreads 6</cmdline>

to control the behaviour of the application, in addition to the <avg_ncpus>6</avg_ncpus> (which only controls BOINC's scheduling, as you've found).

I've only ever used --nthreads under Windows: I'm not sure whether it's applicable under Linux. Perhaps you or Matt could find out for us.

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37349 - Posted: 22 Jul 2014 | 12:35:01 UTC - in response to Message 37346.


<cmdline>--nthreads 6</cmdline>


That's right - it's controlled by the command line option "--nthreads". It should default to using a single core if that's not specified. You'll be able to see in the stderr of the task's tombstone web page what arguments it received.

Matt

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,653,598,843
RAC: 17,535,523
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37351 - Posted: 22 Jul 2014 | 12:56:36 UTC

Matt and Richard,

Thanks for the advice. The cmdline option seemed to do the trick. It is now running on 6 threads.

Thanks for the help,
captainjack

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37352 - Posted: 22 Jul 2014 | 13:14:10 UTC - in response to Message 37351.

I've just updated the app to correct for crashes on clients with venerable Core2 CPUs.

Matt

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37354 - Posted: 22 Jul 2014 | 14:17:01 UTC

Could someone please report on the success or otherwise of suspend/resume of WUs?

Matt

floyd
Send message
Joined: 17 Dec 11
Posts: 11
Credit: 105,502,570
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 37355 - Posted: 22 Jul 2014 | 14:51:42 UTC

Is there any real advantage in making the app multithreaded? It saves memory, but that's all that comes to mind. On the other hand I expect it to be less efficient than running several single-threaded tasks. Plus the BOINC scheduler seems to be bad at managing multithreaded apps.

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37356 - Posted: 22 Jul 2014 | 14:58:28 UTC - in response to Message 37355.


Is there any real advantage in making the app multithreaded?


Yes. For the use we intend for it we need the results back in a timely manner. Running these WUs on a single core will work, but the results are likely to come back too late to be useful. This application scales linearly for small N - I'm estimating 4-8 cores on most machines.

Plus the BOINC scheduler seems to be bad at managing multithreaded apps.


Well that's another thing entirely, and our problem to solve. I'm more concerned that the client doesn't give the user the desired level of control.

Matt

floyd
Send message
Joined: 17 Dec 11
Posts: 11
Credit: 105,502,570
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 37360 - Posted: 22 Jul 2014 | 15:44:49 UTC - in response to Message 37356.

For the use we intend for it we need the results back in a timely manner. Running these WUs on a single core will work, but the results are likely to come back too late to be useful.


This can only work if you don't have to compete for resources. You'll lose your advantage at every task switch or when BOINC decides to delay the start of a cached task in favour of some other. A very short deadline could possibly avoid this but I'm not sure if I could tolerate such hijacking.

Plus the BOINC scheduler seems to be bad at managing multithreaded apps.


Well that's another thing entirely, and our problem to solve. I'm more concerned that the client doesn't give the user the desired level of control.


I was talking about the client's task scheduler actually. As you already mentioned, it will run multithreaded apps on all cores, effectively interrupting all other work.

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,653,598,843
RAC: 17,535,523
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37365 - Posted: 22 Jul 2014 | 22:42:23 UTC

Matt said,

Could someone please report on the success or otherwise of suspend/resume of WUs?

Matt


Just successfully finished an 8.43 task and it shows validated. It was running with an app_config.xml which allocated 6 threads to the task. After it was running for a few minutes, I suspended then resumed the task. Looks like it worked fine. Link to task http://www.gpugrid.net/result.php?resultid=12864432

I have another task running now. After it had been running for ~10 minutes, I shut down BOINC then restarted BOINC. It restarted from the beginning. Looks like it is running fine now. Will report back in later.

Let me know if you need more information.

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37366 - Posted: 22 Jul 2014 | 23:22:38 UTC - in response to Message 37365.


Looks like it worked fine.


Super, thanks!

Matt

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,653,598,843
RAC: 17,535,523
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37367 - Posted: 23 Jul 2014 | 1:20:42 UTC

Matt,

I had a second task that was running when I shut down BOINC and started it back up again. That task has finished and validated.

Ubuntu 14.04
BOINC 7.2.42 installed using the Berkeley installer

Let us know if you want us to run some other tests.

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37422 - Posted: 26 Jul 2014 | 10:58:28 UTC

Hi Everyone,

The new test application is now available for Windows as well as Linux. Please do help us test it!

This application does the same type of simulations as ACEMD, our GPU application. We reason why we are testing it now is that, in pinciple, modern CPUs are now finally fast enough to do process some of our WUs within an acceptable amount of time.

To get to that point though, it is essential to use multiple CPU simultaneously, so this is a multithreaded app. I'd encourage you to let the WUs run on all cores (which it will do by default). The performance scales linearly with core count.

The main objective of this first phase is to test application stability and measure achieved simulation rates and total throughput.

It's a Beta application - to get work for it, you'll need to have your profile set to allow CPU work, allow beta work and enable the application "Molecular Dynamics on CPU".

The app is largely feature-complete. The only issue I know is outstanding is that % completion statistics are wrong. I'm sure you'll all find other issues -
please post your experiences and observations here.

Matt
Matt

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37426 - Posted: 26 Jul 2014 | 13:38:49 UTC - in response to Message 37422.

Hi Matt,

My Haswell 4771 on win7 did one but with error. Five of my wing(wo)man had error too. You can see it here: http://www.gpugrid.net/result.php?resultid=12877823

Off topic: my error page has also still one error on it from 31 Aug 2013 and one form 19 Nov 2013.
____________
Greetings from TJ

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,653,598,843
RAC: 17,535,523
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37427 - Posted: 26 Jul 2014 | 15:40:12 UTC

Hi Matt,

Just ran one of the multithread CPU tasks on a Windows 7 machine with 16 threads.

Matt said:

I'd encourage you to let the WUs run on all cores (which it will do by default). The performance scales linearly with core count.


Per your request, I let it run on all 16 threads. CPU Utilization was pegged at 100% throughout the run. Task has uploaded and validated.

Just started another one. I will keep an eye on it and let you know if anything changes.

Let me know if you want me to try a different kind of test.

captainjack

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37428 - Posted: 26 Jul 2014 | 16:37:44 UTC - in response to Message 37426.


My Haswell 4771 on win7 did one but with error.


Those were with the previous version. 844 is current, and should already have fixed those problems.

Matt

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37429 - Posted: 26 Jul 2014 | 16:46:29 UTC - in response to Message 37428.


My Haswell 4771 on win7 did one but with error.


Those were with the previous version. 844 is current, and should already have fixed those problems.

Matt

Thanks, will wait for new ones.
____________
Greetings from TJ

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37432 - Posted: 26 Jul 2014 | 17:12:58 UTC - in response to Message 37429.


Thanks, will wait for new ones.


There are plenty of unsent tasks - if you're not receiving them, best check your project settings as below.

Matt

floyd
Send message
Joined: 17 Dec 11
Posts: 11
Credit: 105,502,570
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 37444 - Posted: 27 Jul 2014 | 14:17:28 UTC

Matt,

you already know about the incorrect progress display, together with the elapsed and remaining run times, but you haven't mentioned if you see a way to fix this. If not, I'd like to point out that all calculations based on those values are of course wrong too, like computing speed, estimated run times and credits, possibly affecting system operations and user acceptance.

For completeness, all my 8.43 WUs so far have finished and validated without further issues, including one that restarted after a system shutdown. WU size seems reasonable.

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37445 - Posted: 27 Jul 2014 | 14:28:18 UTC - in response to Message 37444.

There'll be a new version out later today, after stumps.

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37462 - Posted: 28 Jul 2014 | 20:34:58 UTC - in response to Message 37445.

As promised, version 845 should report progress correctly.

Matt

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37480 - Posted: 29 Jul 2014 | 12:14:54 UTC

Or not, because I don't know the difference between percentages and fractions. 846 out now, which should also correctly report to the client when a checkpoint was performed.

Matt

Dr Who Fan
Send message
Joined: 24 Mar 14
Posts: 4
Credit: 506,809
RAC: 0
Level
Gly
Scientific publications
wat
Message 37488 - Posted: 29 Jul 2014 | 16:21:14 UTC

Just aborted a V845 that has been been at 100% complete forever...
According to BOINC it was 4,706.000% DONE

Application Test application for CPU MD 8.45 (mt)
Workunit name 6_745-MJHARVEY_gpugrid10z4-0-1-RND3120
State Waiting to run
Received 28-07-2014 16:10
Report deadline 02-08-2014 16:08
Estimated app speed 3.82 GFLOPs/sec
Estimated task size 5,000,000 GFLOPs
Resources 2 CPUs
CPU time at last checkpoint 00:00:00
CPU time 21:05:58
Elapsed time 13:59:25
Estimated time remaining 00:00:00
Fraction done 4,706.000%
Virtual memory size 49.34 MB
Working set size 0.35 MB
Directory slots/5
Process ID 5716


____________

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37496 - Posted: 30 Jul 2014 | 18:41:55 UTC - in response to Message 37488.


According to BOINC it was 4,706.000% DONE


845 misreported by a factor of 100. You job was 47% complete when you killed it.
This is fixed in 846.

MJH

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37497 - Posted: 30 Jul 2014 | 20:59:07 UTC
Last modified: 30 Jul 2014 | 21:10:05 UTC

Matt,
currently running a v 8.46 -
http://www.gpugrid.net/result.php?resultid=12898547

It has been running 1 hour and reports that it is 1% complete, with time to completion 5days and 36 minutes?

EDIT - Also, there are no other BOINC tasks or non-BOINC app's running. BOINC reports CPU usage at 100%, but Win XP Task Manager reports only 50% CPU usage.

Profile Presrvd
Avatar
Send message
Joined: 6 Jul 14
Posts: 5
Credit: 41,548,910
RAC: 0
Level
Val
Scientific publications
watwatwatwat
Message 37498 - Posted: 30 Jul 2014 | 21:42:21 UTC - in response to Message 37496.

Mine ran for roughly 5 1/2 hours, and went from 8732% to complete instantly, and the part that is really bothering me is that I have the test applications turned off.

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,653,598,843
RAC: 17,535,523
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37499 - Posted: 30 Jul 2014 | 21:46:58 UTC

Hi Matt,

Just started up an 8.46 job in Windows 7. My app_config is listed below, but the app seems to ignore the app_config settings for number of CPU's. Can you see something that needs to be changed in my app_config. BOINC Manager is not showing any errors when it reads the app_config.

<app_config>
<app>
<name>android</name>
<max_concurrent>8</max_concurrent>
</app>
<app_version>
<app_name>cpumd</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>8</avg_ncpus>
<cmdline>--nthreads 8</cmdline>
</app_version
</app_config>


The Linux tasks seems to be running well and running within the number of CPU's specified in the app_config. As a reminder, I really like using the app_config so I can reserve a few cpu threads to support GPU processing.

Thanks,
captainjack

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,653,598,843
RAC: 17,535,523
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37500 - Posted: 30 Jul 2014 | 23:35:09 UTC

The 8.46 task on Windows 7 just finished. It was running on 16 threads. Here are the estimates:

Elapsed minutes: 42
% complete: 1.75
Estimated to finish: 40:53:39

Elapsed minutes: 103
% complete: 4.2
Estimated to finish: 39:53:44

Elapsed minutes: 107
Finished

Hope that helps.

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37505 - Posted: 31 Jul 2014 | 8:55:07 UTC - in response to Message 37499.

Captainjack,

That app config looks ok to me. Have a look in the stderr reported by the job, that will say at the top how many threads the program is using.

Matt

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37506 - Posted: 31 Jul 2014 | 9:32:39 UTC - in response to Message 37505.

The app_config probably won't take effect immediately (even if you read the config files from Boinc) as work units occupy slots and cores are already allocated to started work. So it should kick in after a WU finishes. The trouble with this is that the WU's are long (just like the GPU work units).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile PDH
Send message
Joined: 15 Oct 10
Posts: 1
Credit: 342,714,168
RAC: 2,571
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37523 - Posted: 1 Aug 2014 | 12:46:49 UTC
Last modified: 1 Aug 2014 | 12:50:05 UTC

Excellent job. Multithreaded application works fine on my host, V846 report progress correctly. Now GPUGRID is like Folding@Home project - with apps for the GPUs and multi-core CPUs. Thx.

Jim Daniels (JD)
Send message
Joined: 20 Jan 13
Posts: 9
Credit: 206,731,892
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37531 - Posted: 1 Aug 2014 | 22:28:11 UTC

I don't have time to check my BOINC client all that often so I am not sure how long this has been going on. I had "Test Applications" disabled but I still found these work units running on my system. It appears GPU tasks will not run with these WUs using all the cores. At least that is my assumption, possibly an incorrect one, since no GPU WU was running or even pending.

I disabled the "Molecular Dynamics on CPU" jobs, updated the BOINC client, aborted all the MT WUs and updated the client again. No GPU WUs were loading so I tried updated the client again with the same results. The BOINC client log says no short or long GPU tasks are available. I checked the server and there are thousands of short and long WU that are unsent. I rebooted and I am still getting the same thing.

Any guesses on how long my GPU will remain idle?

Jim Daniels (JD)
Send message
Joined: 20 Jan 13
Posts: 9
Credit: 206,731,892
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37532 - Posted: 1 Aug 2014 | 22:43:14 UTC - in response to Message 37531.
Last modified: 1 Aug 2014 | 22:44:13 UTC

The client log also says I have processed my daily quota of 16 tasks. :-(

I suppose that means I won't get any new tasks until tomorrow. I guess my GPU will get another day of vacation. :-)

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37534 - Posted: 2 Aug 2014 | 13:30:05 UTC - in response to Message 37532.

Could someone confirm that a client without any special configuration will obtain and execute both CPU and GPU WUs simultaneously?

Matt

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,923,227,372
RAC: 18,784,623
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37535 - Posted: 2 Aug 2014 | 15:22:34 UTC - in response to Message 37534.

Could someone confirm that a client without any special configuration will obtain and execute both CPU and GPU WUs simultaneously?

Matt

Only if GPUGrid is the only project the client is attached to. Look at this Event Log snippet from one of my GPUGrid attached machines.

02/08/2014 15:51:51 | boincsimap | [work_fetch] REC 0.000 prio -0.000000 can req work
02/08/2014 15:51:51 | LHC@home 1.0 | [work_fetch] REC 0.429 prio -0.000007 can req work
02/08/2014 15:51:51 | Einstein@Home | [work_fetch] REC 2309.624 prio -0.044407 can req work
02/08/2014 15:51:51 | NumberFields@home | [work_fetch] REC 5219.505 prio -0.088347 can req work
02/08/2014 15:51:51 | GPUGRID | [work_fetch] REC 232810.246 prio -1.957289 can req work
02/08/2014 15:51:51 | SETI@home | [work_fetch] REC 242781.786 prio -2.030169 can req work

<work_fetch_debug> lists projects in priority order for the next work fetch. SIMAP is highest priority because they've been off-stream for about the last month between batches. When SIMAP comes back on-stream on Thursday, work will be fetched preferentially from there to even up resource share and make up for that missing month.

LHC and NumberFields are my other two active CPU projects (Einstein is intel_gpu only on this machine, so let's leave it out for now). When all three are active and have work available, their REC and priority figures will all be jostling around the same levels, and work will be fetched turn-and-turn-about to maintain resource share.

But my two GPU projects - SETI and GPUGrid, they're allocated one GPU each - are in a class of their own. The REC (Recent Estimated Credit - bears no relationship to actual granted credit) from a GPU is so much higher than from a CPU that work fetch priority is driven extremely low - CPU work will only be fetched as a last resort when all other possible sources of supply have been exhausted.

Even if you force it to fetch work by blocking other projects, it still faces a similar priority hurdle before actually running. I've been trying to get a couple of test CPU tasks from SETI to run this afternoon, and I've had to tweak a lot of my normal settings to force them into action.

In short: if a client is running GPU work from GPUGrid, it will have a strong bias towards running "anything except GPUGrid" on its CPUs.

floyd
Send message
Joined: 17 Dec 11
Posts: 11
Credit: 105,502,570
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 37536 - Posted: 2 Aug 2014 | 15:25:14 UTC

I don't know about obtaining but I can confirm that Linux BOINC 7.2.47 does not suspend my Einstein GPU WUs when running cpumd on all cores. That must be a BOINC error however.

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,653,598,843
RAC: 17,535,523
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37537 - Posted: 2 Aug 2014 | 21:00:52 UTC

Matt asked:

Could someone confirm that a client without any special configuration will obtain and execute both CPU and GPU WUs simultaneously?


I just reconfigured my Ubuntu box to not have an app_config and test Matt's question. It is currently running two GPUGRID GPU tasks and one CPUMD task using all available CPU's.

The GPU tasks show that they are using 0.756 CPU each and the CPUMD task is using 11 CPU's (all that is allocated to that BOINC client). BOINC has over-commited the CPU's available, but it does seem to be working.

Let us know if you want us to perform a different test with a different configuration.

Hope that helps,
captainjack

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37539 - Posted: 3 Aug 2014 | 19:18:39 UTC - in response to Message 37537.

Matt, the harsh reality is that GPU WU's use the CPU and the more you use the CPU for other work the more it impacts upon GPU work.
Your app is fine for systems without an NVidia GPU. Otherwise its a bang your head of a wall exercise!
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38506 - Posted: 14 Oct 2014 | 16:07:29 UTC

The Linux version of CPUMD has been updated. Changes:

* Rebase from gmx 4.6 to 5.0
* There are now optimised builds for SSE2, SSE4 and AVX.
* BOINC progress reporting

Matt

Profile totoshi
Send message
Joined: 30 Aug 14
Posts: 2
Credit: 3,679,508
RAC: 0
Level
Ala
Scientific publications
watwat
Message 38514 - Posted: 14 Oct 2014 | 18:41:20 UTC

Hey there,

I received a very long WU (4377-MJHARVEY_CPUDHFR-0-1-RND8587_1) on my windows machine which has a runtime of approx. 305 hours. Really? The deadline is in one week. ;)

After ~ 10 min my computer has crunched ~ 0,042 %.

I guess, I will not be able to send this WU back in time. ;)

If I remember correctly, then one of my linux machines has such a WU as well.

Should I abort these ones?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38516 - Posted: 14 Oct 2014 | 18:54:09 UTC - in response to Message 38514.

totosi,

Ignore both the estimated runtime and also the progress monitor - both are wrong. The task ought to take around 24-50h, depending on your machine. Feel free to kill it if it's causing you inconvenience.

Matt

Profile totoshi
Send message
Joined: 30 Aug 14
Posts: 2
Credit: 3,679,508
RAC: 0
Level
Ala
Scientific publications
watwat
Message 38518 - Posted: 14 Oct 2014 | 19:32:52 UTC

Hi Matt,

Alrighty. I will ignore both. :)

Thx for the information.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 758,014,472
RAC: 319,467
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38519 - Posted: 14 Oct 2014 | 20:25:32 UTC

I got one of the test applications for CPUMD.

http://www.gpugrid.net/result.php?resultid=13209824
http://www.gpugrid.net/workunit.php?wuid=10166393

Now at 0.442% progress (occasionally increasing by about 0.002%), 06:43:9 elapsed, 2930:20:57 estimated remaining.

Running high priority. Using 3 CPU cores, the maximum I allow BOINC to use on that computer.

If the estimated time to completion is anything close to accurate, I probably won't allow any more of these to run on that computer.

Running under 64-bit Windows Vista.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38520 - Posted: 14 Oct 2014 | 20:34:58 UTC - in response to Message 38519.

Robert,

Both the runtime estimates and the completion progress are wrong. The task ought take no more than 2 days, depending on the spec of your machine.

Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38598 - Posted: 21 Oct 2014 | 0:25:18 UTC
Last modified: 21 Oct 2014 | 0:27:29 UTC

New CPUMD version 900 for AVX. Major change to a 64bit build.

You'll need a dev version of BOINC to get the AVX app - 7.2.42 doesn't report cpu caps correctly.

This version of the app report its progress correctly.

(NB - there's also a 900 SSE2 which is the same as the older 850)

Matt

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38602 - Posted: 21 Oct 2014 | 11:47:57 UTC - in response to Message 38598.

Haven't been able to receive any new AVX CPUMD tasks with BOINC version 7.4.21 even though 8000 CPUMD tasks are available.

14/10/21 04:21:30 | GPUGRID | No tasks are available for Test application for CPU MD
14/10/21 04:34:59 | GPUGRID | No tasks are available for Test application for CPU MD
14/10/21 07:42:57 | GPUGRID | No tasks are available for Test application for CPU MD
14/10/21 07:49:08 | GPUGRID | No tasks are available for Test application for CPU MD



Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38603 - Posted: 21 Oct 2014 | 12:02:08 UTC
Last modified: 21 Oct 2014 | 12:02:25 UTC

I was also *unable* to get a new CPU task, on BOINC 7.4.22 x64.
Why?

10/21/2014 8:03:18 AM | GPUGRID | [work_fetch] set_request() for CPU: ninst 6 nused_total 2.00 nidle_now 6.00 fetch share 1.00 req_inst 6.00 req_secs 41472.00
10/21/2014 8:03:18 AM | GPUGRID | [sched_op] Starting scheduler request
10/21/2014 8:03:18 AM | GPUGRID | [work_fetch] request: CPU (41472.00 sec, 6.00 inst) miner_asic (0.00 sec, 0.00 inst) NVIDIA GPU (0.00 sec, 0.00 inst)
10/21/2014 8:03:18 AM | GPUGRID | Sending scheduler request: To fetch work.
10/21/2014 8:03:18 AM | GPUGRID | Requesting new tasks for CPU
10/21/2014 8:03:18 AM | GPUGRID | [sched_op] CPU work request: 41472.00 seconds; 6.00 devices
10/21/2014 8:03:18 AM | GPUGRID | [sched_op] miner_asic work request: 0.00 seconds; 0.00 devices
10/21/2014 8:03:18 AM | GPUGRID | [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
10/21/2014 8:03:19 AM | GPUGRID | Scheduler request completed: got 0 new tasks
10/21/2014 8:03:19 AM | GPUGRID | [sched_op] Server version 613
10/21/2014 8:03:19 AM | GPUGRID | No tasks sent
10/21/2014 8:03:19 AM | GPUGRID | Project requested delay of 31 seconds
10/21/2014 8:03:19 AM | GPUGRID | [work_fetch] backing off CPU 603 sec
10/21/2014 8:03:19 AM | GPUGRID | [sched_op] Deferring communication for 00:00:31
10/21/2014 8:03:19 AM | GPUGRID | [sched_op] Reason: requested by project

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38605 - Posted: 21 Oct 2014 | 12:30:02 UTC - in response to Message 38603.

Jacob, what's the host ID?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38606 - Posted: 21 Oct 2014 | 12:31:40 UTC - in response to Message 38605.
Last modified: 21 Oct 2014 | 12:38:53 UTC

For my work fetch request which yielded 0 CPU tasks...
Computer ID was: 153764

It's an 8-logical-CPU machine, setup to run 75% of CPUs (since I'm running 2 cpu-intensive VM tasks outside of BOINC)... so I usually get MT tasks that run 6 CPUs on this machine. The machine also has 3 NVIDIA GPUs.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38607 - Posted: 21 Oct 2014 | 12:48:20 UTC - in response to Message 38606.

According to the logs, as of 50mins ago, you should have been receiving work, and the mtsse4 app (which actually maps to the 845 sse2 binary)
The host features reported by your BOINC client are:

host [153764] client [70422] plan_class [mtsse4] effective_cpus [6] [fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe]

As you can see, no AVX!


I am pretty sure this represents a bug in the 7.4.22 BOINC app, since your CPU and OS are manifestly AVX-capable.

Matt

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38608 - Posted: 21 Oct 2014 | 12:55:48 UTC - in response to Message 38607.
Last modified: 21 Oct 2014 | 12:58:11 UTC

2 questions:

Are you sure my computer is avx capable? [This machine just celebrated its 5-year-birthday, with no upgrades to mobo nor CPU] */**
Are you sure the BOINC client is supposed to be able to detect that and relay that to the server scheduler?

If the answer is yes to both, then I'll pass the info to the BOINC Alpha mailing list to see if it's a bug.

* This website seems to indicate I might legitimiately NOT have avx, based on dates:
http://en.wikipedia.org/wiki/Advanced_Vector_Extensions
** Here's my CPU, I believe: http://ark.intel.com/products/37149/Intel-Core-i7-965-Processor-Extreme-Edition-8M-Cache-3_20-GHz-6_40-GTs-Intel-QPI

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38609 - Posted: 21 Oct 2014 | 12:58:23 UTC - in response to Message 38608.

Actually, I misread your CPU model - the i7 CPU 965 doesn't have AVX, only SSE4.
Bus are you saying that you are receiving no work at all?

Matt

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38610 - Posted: 21 Oct 2014 | 12:59:26 UTC - in response to Message 38609.
Last modified: 21 Oct 2014 | 12:59:42 UTC

I am not receiving any CPU tasks from GPUGrid, as evidenced by my work fetch request/response that I pasted a couple posts up.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38611 - Posted: 21 Oct 2014 | 12:59:32 UTC
Last modified: 21 Oct 2014 | 13:10:29 UTC

BOINC 7.4.21 properly reporting CPU feature set for Intel Ivy Bridge.

14/10/20 18:23:06 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes f16c rdrandsyscall nx lm avx vmx tm2 pbe fsgsbase smep


14/10/21 09:05:00 | GPUGRID | No tasks are available for Test application for CPU MD

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38612 - Posted: 21 Oct 2014 | 13:03:45 UTC
Last modified: 21 Oct 2014 | 13:04:42 UTC

On this page:
http://www.gpugrid.net/apps.php
... it seems that the only x64 version offered is avx.

Or, is it supposed to send me the x86 (mtsse4) app, even though I'm using x64? Because, that did not happen :(

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38613 - Posted: 21 Oct 2014 | 13:15:36 UTC - in response to Message 38612.

Getting 901 now?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38614 - Posted: 21 Oct 2014 | 13:26:48 UTC - in response to Message 38612.

Jacob,

As I understand it, you should get the 32b app, even if using a 64bit BOINC client. (Although, for maximum confusion, the 32b app is actually a 64b one)

Matt

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38615 - Posted: 21 Oct 2014 | 13:26:58 UTC - in response to Message 38613.
Last modified: 21 Oct 2014 | 13:28:06 UTC

Getting 901 now?

No :(

10/21/2014 9:28:58 AM | GPUGRID | [work_fetch] set_request() for CPU: ninst 6 nused_total 2.00 nidle_now 6.00 fetch share 1.00 req_inst 6.00 req_secs 41472.00
10/21/2014 9:28:58 AM | GPUGRID | [sched_op] Starting scheduler request
10/21/2014 9:28:58 AM | GPUGRID | [work_fetch] request: CPU (41472.00 sec, 6.00 inst) miner_asic (0.00 sec, 0.00 inst) NVIDIA GPU (0.00 sec, 0.00 inst)
10/21/2014 9:28:58 AM | GPUGRID | Sending scheduler request: To fetch work.
10/21/2014 9:28:58 AM | GPUGRID | Requesting new tasks for CPU
10/21/2014 9:28:58 AM | GPUGRID | [sched_op] CPU work request: 41472.00 seconds; 6.00 devices
10/21/2014 9:28:58 AM | GPUGRID | [sched_op] miner_asic work request: 0.00 seconds; 0.00 devices
10/21/2014 9:28:58 AM | GPUGRID | [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
10/21/2014 9:29:00 AM | GPUGRID | Scheduler request completed: got 0 new tasks
10/21/2014 9:29:00 AM | GPUGRID | [sched_op] Server version 613
10/21/2014 9:29:00 AM | GPUGRID | No tasks sent
10/21/2014 9:29:00 AM | GPUGRID | Project requested delay of 31 seconds
10/21/2014 9:29:00 AM | GPUGRID | [work_fetch] backing off CPU 658 sec
10/21/2014 9:29:00 AM | GPUGRID | [sched_op] Deferring communication for 00:00:31
10/21/2014 9:29:00 AM | GPUGRID | [sched_op] Reason: requested by project

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38616 - Posted: 21 Oct 2014 | 13:45:31 UTC - in response to Message 38615.

The server has arbitratily decided that the new app versions are "not reliable". I wonder what that means?

Carlos Augusto Engel
Send message
Joined: 5 Jun 09
Posts: 38
Credit: 2,880,758,878
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38618 - Posted: 21 Oct 2014 | 15:43:15 UTC

My first avx with error:
http://www.gpugrid.net/result.php?resultid=13227076

Name 86-MJHARVEY_TEST1001-0-1-RND5489_1
Workunit 10157681
Created 17 Oct 2014 | 11:36:40 UTC
Sent 21 Oct 2014 | 15:34:02 UTC
Received 21 Oct 2014 | 15:35:57 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 255 (0xff) Unknown error number
Computer ID 169357
Report deadline 26 Oct 2014 | 15:34:02 UTC
Run time 0.41
CPU time 0.00
Validate state Invalid
Credit 0.00
Application version Test application for CPU MD v9.01 (mtavx)
Stderr output

<core_client_version>7.4.22</core_client_version>
<![CDATA[
<message>
The extended attributes are inconsistent.
(0xff) - exit code 255 (0xff)
</message>
]]>
____________

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38619 - Posted: 21 Oct 2014 | 16:10:02 UTC
Last modified: 21 Oct 2014 | 16:12:20 UTC

Multiple Test application for CPU MD v9.01 (mtavx) failures- all with the same error message: The extended attributes are inconsistent. (0xff) - exit code 255 (0xff)

All tasks have 0.00 CPU/run time.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38620 - Posted: 21 Oct 2014 | 16:49:22 UTC

My machine got work now, a 9.01 mtsse4 task. I will start it soon, and reply with the result.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38622 - Posted: 21 Oct 2014 | 18:27:00 UTC - in response to Message 38619.

Multiple Test application for CPU MD v9.01 (mtavx) failures- all with the same error message: The extended attributes are inconsistent. (0xff) - exit code 255 (0xff)

All tasks have 0.00 CPU/run time.



Yes, my fault. The app build is bad.

Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38623 - Posted: 21 Oct 2014 | 20:46:15 UTC - in response to Message 38622.

902

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38624 - Posted: 21 Oct 2014 | 21:13:17 UTC
Last modified: 21 Oct 2014 | 21:24:30 UTC

9.02 is working--BOINC progress is 0.023 at fifteen minutes. Task has 10000hr estimated runtime throwing the 20mgx GPU task to the side. Estimated runtime is going up as task computes- currently @ 12000hr.

If BOINC is correctly reporting task progress-- CPU time till complete-- 10hr for every 1% computed > 1000hr/44days total runtime?

14/10/21 17:06:26 | GPUGRID | [cpu_sched_debug] unfoldx476-NOELIA_UNFOLD-3-72-RND3785_0 sched state 2 next 2 task state 1
14/10/21 17:06:26 | GPUGRID | [cpu_sched_debug] 20mgx396-NOELIA_20MG2-2-50-RND9100_1 sched state 1 next 1 task state 0
14/10/21 17:06:26 | GPUGRID | [cpu_sched_debug] 4083-MJHARVEY_CPUDHFR-0-1-RND9529_1 sched state 2 next 2 task state 1
14/10/21 17:10:45 | GPUGRID | [rr_sim_detail] 0.00: starting 4083-MJHARVEY_CPUDHFR-0-1-RND9529_1 (4.00 CPU) (514480914.05G/12.54G)
14/10/21 17:10:45 | | [rrsim_detail] rpbest: 4083-MJHARVEY_CPUDHFR-0-1-RND9529_1 (finish delay 40205433.52)
14/10/21 17:11:45 | GPUGRID | [rr_sim] 4083-MJHARVEY_CPUDHFR-0-1-RND9529_1 misses deadline by 40416714.74

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38626 - Posted: 21 Oct 2014 | 22:06:55 UTC - in response to Message 38624.

Ok, so the length reporting is still wrong.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,722,889,584
RAC: 1,601,101
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38627 - Posted: 22 Oct 2014 | 2:19:39 UTC

does it work for pricese puppy 5.7? I do not get any WUs. Thanks.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38641 - Posted: 23 Oct 2014 | 12:51:23 UTC

Just tried one of these WUs. My BOINC preference is set to used 50% of the processors, which is four of my eight. The WU grabbed four; Task Manager showed this task using 50%.

Thermal Radar normally shows my CPU temp at 41C. Within five minutes the temp had shot up to 65C and the red warning came on. I killed the WU.

Anyone else having CPU overheating problems with these WUs?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38642 - Posted: 23 Oct 2014 | 13:51:17 UTC - in response to Message 38641.

65C is hardly an unreasonable operating temperature for a CPU.

Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38643 - Posted: 23 Oct 2014 | 13:52:04 UTC - in response to Message 38627.

does it work for pricese puppy 5.7? I do not get any WUs. Thanks.


Who is Princess Puppy?

Seriously, tell me the machine ID and I can take a look.

Carlos Augusto Engel
Send message
Joined: 5 Jun 09
Posts: 38
Credit: 2,880,758,878
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38644 - Posted: 23 Oct 2014 | 13:58:41 UTC

I did. i changed "Use at moust 40.00% of CPU time ( 0 means no restriction ).

____________

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38645 - Posted: 23 Oct 2014 | 14:33:59 UTC - in response to Message 38641.
Last modified: 23 Oct 2014 | 14:34:23 UTC

Just tried one of these WUs. My BOINC preference is set to used 50% of the processors, which is four of my eight. The WU grabbed four; Task Manager showed this task using 50%.

Thermal Radar normally shows my CPU temp at 41C. Within five minutes the temp had shot up to 65C and the red warning came on. I killed the WU.

Anyone else having CPU overheating problems with these WUs?


Note 1: On my 8-logical-CPU, Intel i7 965 XE, all 4 cores routinely run at 86*C and near-100% CPU usage, 24/7, for 5 years straight, unless I'm gaming. Core Temp shows that TjMax (thermal limiting temperature) is 100*C, and I've never hit that mark.

Note 2: MT tasks are allowed to overcommit the CPU, especially in cases where they are running alongside coprocessor (GPU/ASIC) tasks or other high-priority CPU tasks.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38646 - Posted: 23 Oct 2014 | 16:30:37 UTC - in response to Message 38642.

65C is hardly an unreasonable operating temperature for a CPU.

Thanks for your responses, Matt & Jacob.

OK. I downloaded another WU and have been running for 2+ hours, with Core Temp running for most of that. Before the WU started, CPU temp was 40C and CPU fan was 2700rpm. The fan is now at 3500rpm and you can see the CPU temps here:



A bit worried about that 75C max...

Am I OK to continue, with the Thermal Radar temp well into the red??

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38647 - Posted: 23 Oct 2014 | 18:49:10 UTC
Last modified: 23 Oct 2014 | 18:49:29 UTC

Who says red is bad? Maybe red is just "hi", and then ultraviolet neon green is "nuclear"?

:)

Basically, your TjMax is 90. You should feel comfortable going up to 80*C or maybe even 85*C, I'd think, before worrying about heat and stability.

Astiesan
Send message
Joined: 8 Jun 10
Posts: 3
Credit: 1,074,178,895
RAC: 3,390,509
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38648 - Posted: 23 Oct 2014 | 23:13:45 UTC

Is the time estimate for these work units incorrect? I have no slouch of a processor, a 4790K @ stock speeds, and it estimates 66 hours to completion with just shy of four hours of work done. If this is accurate, I will barely be able to complete the two I have downloaded before their turn in time on the 28th assuming I use my computer for a standard 8 hours a day.

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38649 - Posted: 24 Oct 2014 | 4:03:45 UTC
Last modified: 24 Oct 2014 | 4:27:00 UTC

I am getting "Download failed" when downloading the CPU app (avx version) on a Windows 7 x64.

EDIT: MD5 check fail.
EDIT2: Checking the "skip image file verification" under BOINC preferences fixed the error... but I'm pretty sure this is not how things should be done.
EDIT3: I re-enabled the image file verification and it managed to download it correctly, yet it only uses 1 core (although BOINC says 8), and the progress report seems to have gotten stuck with:

Log file opened on Fri Oct 24 01:12:46 2014
Host: unknown pid: 6588 rank ID: 0 number of ranks: 1
GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2

GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
Peter Tieleman Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2014, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2
Executable: D:\ProgramData\BOINC\slots\8\projects\www.gpugrid.net\mdrun-502-902-avx-64.exe
Library dir: C:\Program Files\Gromacs\share\gromacs\top
Command line:
mdrun-502-902-avx-64 -ntomp 8 -nt 8 -x traj.xtc -s topol.tpr -g progress.log -cpi state.cpt

Gromacs version: VERSION 5.0.2
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled
GPU support: disabled
invsqrt routine: gmx_software_invsqrt(x)
SIMD instructions: AVX_256
FFT library: fftw3
RDTSCP usage: enabled
C++11 compilation: disabled
TNG support: enabled
Tracing support: disabled
Built on: Unknown date
Built by: Anonymous@unknown [CMAKE]
Build OS/arch: Windows-6.1 AMD64
Build CPU vendor: GenuineIntel
Build CPU brand: Intel(R) Core(TM) i3-2365M CPU @ 1.40GHz
Build CPU family: 6 Model: 42 Stepping: 7
Build CPU features: apic clfsh cmov cx8 lahf_lm mmx msr pse rdtscp sse2 sse3 ssse3
C compiler: C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin/x86_amd64/cl.exe MSVC 16.0.30319.1
C compiler flags: /arch:AVX /DWIN32 /D_WINDOWS /W3 /MD /O2 /Ob2 /D NDEBUG
C++ compiler: C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin/x86_amd64/cl.exe MSVC 16.0.30319.1
C++ compiler flags: /arch:AVX /DWIN32 /D_WINDOWS /W3 /GR /EHsc /wd4800 /wd4355 /wd4996 /wd4305 /wd4244 /wd4101 /wd4267 /wd4090 /MD /O2 /Ob2 /D NDEBUG
Boost version: 1.55.0 (internal)



++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

Changing nstlist from 10 to 20, rlist from 0.9 to 0.928

Input Parameters:
integrator = md-vv
tinit = 0
dt = 0.002
nsteps = 5000000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = 1993
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 1000
nstcalcenergy = 100
nstenergy = 1000
nstxout-compressed = 25000
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 20
ns-type = Grid
pbc = xyz
periodic-molecules = FALSE
verlet-buffer-tolerance = 0.005
rlist = 0.928
rlistlong = 0.928
nstcalclr = 10
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 0.9
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-switch
rvdw-switch = 0.75
rvdw = 0.9
DispCorr = No
table-extension = 1
fourierspacing = 0.1
fourier-nx = 64
fourier-ny = 64
fourier-nz = 64
pme-order = 4
ewald-rtol = 1e-005
ewald-rtol-lj = 1e-005
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
implicit-solvent = No
gb-algorithm = Still
nstgbradii = 1
rgbradii = 1
gb-epsilon-solvent = 80
gb-saltconc = 0
gb-obc-alpha = 1
gb-obc-beta = 0.8
gb-obc-gamma = 4.85
gb-dielectric-offset = 0.009
sa-algorithm = Ace-approximation
sa-surface-tension = 2.05016
tcoupl = Nose-Hoover
nsttcouple = 10
nh-chain-length = 10
print-nose-hoover-chain-variables = FALSE
pcoupl = No
pcoupltype = Isotropic
nstpcouple = -1
tau-p = 1
compressibility (3x3):
compressibility[ 0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
compressibility[ 1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
compressibility[ 2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
ref-p (3x3):
ref-p[ 0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
ref-p[ 1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
ref-p[ 2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
refcoord-scaling = No
posres-com (3):
posres-com[0]=0.00000e+000
posres-com[1]=0.00000e+000
posres-com[2]=0.00000e+000
posres-comB (3):
posres-comB[0]=0.00000e+000
posres-comB[1]=0.00000e+000
posres-comB[2]=0.00000e+000
QMMM = FALSE
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Shake
continuation = FALSE
Shake-SOR = FALSE
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = no
rotation = FALSE
interactiveMD = FALSE
disre = No
disre-weighting = Conservative
disre-mixed = FALSE
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
deform[ 1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
deform[ 2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
simulated-tempering = FALSE
E-x:
n = 0
E-xt:
n = 0
E-y:
n = 0
E-yt:
n = 0
E-z:
n = 0
E-zt:
n = 0
swapcoords = no
adress = FALSE
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
grpopts:
nrdf: 48414
ref-t: 300
tau-t: 0.8
annealing: No
annealing-npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0


and there's all there is... I'll reset the project once the GPU WU is completed and see what happens then.

Also, my other AVX-capable machine only downloads SSE2 WUs, not AVX.
____________

boinc127
Send message
Joined: 31 Aug 13
Posts: 11
Credit: 7,952,212
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 38653 - Posted: 24 Oct 2014 | 8:44:50 UTC

I'm currently running the mdrun-502-902-avx-64.exe program for Windows 64 bit. The program has been running for about 7 hours and is about to go to 6% complete. However, if I start the task manager it reports the program is only using 27-28% cpu, even though it should be running all 4 cores.

Does it take a while for the program to start using all 4 of the cores?

Here is the stderr.txt output:

BOINC wrapper for GROMACS.
Arg 0 [projects/www.gpugrid.net/mdrun-502-902-avx-64.exe]
Arg 1 [--nthreads]
Arg 2 [4]
BOINC running with [4] threads
BOINC resolving [traj.xtc] to [traj.xtc]
BOINC resolving [topol.tpr] to [topol.tpr]
BOINC resolving [progress.log] to [progress.log]
BOINC resolving [state.cpt] to [state.cpt]
GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2

GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
Peter Tieleman Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2014, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2
Executable: C:\ProgramData\BOINC\slots\1\projects\www.gpugrid.net\mdrun-502-902-avx-64.exe
Library dir: C:\Program Files\Gromacs\share\gromacs\top
Command line:
mdrun-502-902-avx-64 -ntomp 4 -nt 4 -x traj.xtc -s topol.tpr -g progress.log -cpi state.cpt

Reading file topol.tpr, VERSION 4.6.1 (single precision)
Note: file tpx version 83, software tpx version 100
Changing nstlist from 10 to 20, rlist from 0.9 to 0.928

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38655 - Posted: 24 Oct 2014 | 11:22:17 UTC - in response to Message 38648.
Last modified: 24 Oct 2014 | 11:37:30 UTC

Astiesan wrote:

Is the time estimate for these work units incorrect?

Yes estimates are incorrect.
With AVX 8 threads- the task will take a total ~16hr.

Chilean wrote:
Also, my other AVX-capable machine only downloads SSE2 WUs, not AVX.

Upgrade to BOINC dev 7.4.22 as you did with 3610QM machine. Non-Dev kit are incorrectly reading CPU feature kit for CPUMD tasks.

Boinc127 wrote:
Does it take a while for the program to start using all 4 of the cores?

No- task should use amount threads you've set with BOINC. If you have heavy background processes running concurrent with task this could eat away at cycles.

boinc127
Send message
Joined: 31 Aug 13
Posts: 11
Credit: 7,952,212
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 38658 - Posted: 24 Oct 2014 | 15:04:10 UTC

I'm using the mdrun-502-902-avx-64.exe program and it should be running on 4 cores but its only running on one core. Task manager says its only using 27% cpu, so about 1 core. I also tested multithreading with the only other MT app I can think of, Milkyway N-Body Simulation, and it runs about 80-85% of the total cpu's (all 4 cpus). I've reset and removed and reattached the project, and it still uses only 1 core. There are no other programs taking heavy compute cycles in the background.

boinc127
Send message
Joined: 31 Aug 13
Posts: 11
Credit: 7,952,212
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 38659 - Posted: 24 Oct 2014 | 16:59:53 UTC

So the mdrun-502-902-avx-64.exe program is tying up 4 cores but it is only using one core (about 27%). However, if I use 1 core (refreshing the GPUGrid account on BOINC Manager telling it to use 1 core) it uses the same amount of cpu (27% according to task manager) but the time estimate quadruples and the progress is markedly slower. I don't get it. Its obvious that it isn't using 4 cores when it is supposed to... I would expect AVX running on 4 cores would be moderately warm or hot running. Even process explorer reports only about 26 to 27% cpu usage.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38660 - Posted: 24 Oct 2014 | 17:05:10 UTC - in response to Message 38659.
Last modified: 24 Oct 2014 | 17:05:53 UTC

Install Process Explorer. Find the process that is running the task. Double click it. Click the Threads tab. Sort that Threads tab by CPU descending.

If it's running 4-threaded, then the Threads tab should show 4 threads utilizing some CPU.

What do you see there?

boinc127
Send message
Joined: 31 Aug 13
Posts: 11
Credit: 7,952,212
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 38661 - Posted: 24 Oct 2014 | 17:27:41 UTC

Install Process Explorer. Find the process that is running the task. Double click it. Click the Threads tab. Sort that Threads tab by CPU descending.

If it's running 4-threaded, then the Threads tab should show 4 threads utilizing some CPU.

What do you see there?


mdrun-502-902-avx-64.exe properties

3 Threads.

Thread 1 is 4664 using 24% cpu called mdrun-502-902-avx-64.exe!bwlzh_decompress_verbose+0x131ac

Thread 2 is 4432 using < 0.01% cpu called mdrun-502-902-avx-64.exe!bwlzh_decompress_verbose+0x12880

Thread 3 is 2124 called MSVCR100.dll!endthreadex+0x60

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38662 - Posted: 24 Oct 2014 | 17:34:15 UTC - in response to Message 38661.

Looks buggered. Would you kill it and start it again?

MAtt

boinc127
Send message
Joined: 31 Aug 13
Posts: 11
Credit: 7,952,212
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 38663 - Posted: 24 Oct 2014 | 17:43:48 UTC - in response to Message 38662.

Sorry I literally killed the process without thinking about it. However, I did download another one and it is doing the exact same thing (using 3 threads, 24.8% cpu). The thread IDs have changed but they are still using the same start addresses:

mdrun-502-902-avx-64.exe!bwlzh_decompress_verbose+0x131ac

mdrun-502-902-avx-64.exe!bwlzh_decompress_verbose+0x12880

MSVCR100.dll!endthreadex+0x60


tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38664 - Posted: 24 Oct 2014 | 18:00:44 UTC - in response to Message 38655.

Yes estimates are incorrect.
With AVX 8 threads- the task will take a total ~16hr.

I've been though this thread. I installed BOINC 7.4.22 and the estimated remaining time dropped dramatically.

I also Googled AVX. I guess my AMD FX 8350 has it, but do I need to do anything to activate it?

How many hours for AVX 4 threads? In 24 hours I only did 20%.

boinc127
Send message
Joined: 31 Aug 13
Posts: 11
Credit: 7,952,212
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 38665 - Posted: 24 Oct 2014 | 18:23:28 UTC - in response to Message 38664.

I also Googled AVX. I guess my AMD FX 8350 has it, but do I need to do anything to activate it?


According to the AMD website, FX processors do have the AVX instruction set. All you need then is a compatible operating system. Windows 7 SP1 and Windows 8/8.1 do use and recognize the AVX instruction set so you should be set.

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38666 - Posted: 24 Oct 2014 | 19:38:49 UTC

After resetting the project, same thing happened. It says "using 8 threads", yet it only uses one (13% CPU usage).
The progres.txt file doesn't update at all after this:

Log file opened on Fri Oct 24 16:28:33 2014
Host: unknown pid: 5364 rank ID: 0 number of ranks: 1
GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2

GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
Peter Tieleman Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2014, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2
Executable: D:\ProgramData\BOINC\slots\0\projects\www.gpugrid.net\mdrun-502-902-avx-64.exe
Library dir: C:\Program Files\Gromacs\share\gromacs\top
Command line:
mdrun-502-902-avx-64 -ntomp 8 -nt 8 -x traj.xtc -s topol.tpr -g progress.log -cpi state.cpt

Gromacs version: VERSION 5.0.2
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled
GPU support: disabled
invsqrt routine: gmx_software_invsqrt(x)
SIMD instructions: AVX_256
FFT library: fftw3
RDTSCP usage: enabled
C++11 compilation: disabled
TNG support: enabled
Tracing support: disabled
Built on: Unknown date
Built by: Anonymous@unknown [CMAKE]
Build OS/arch: Windows-6.1 AMD64
Build CPU vendor: GenuineIntel
Build CPU brand: Intel(R) Core(TM) i3-2365M CPU @ 1.40GHz
Build CPU family: 6 Model: 42 Stepping: 7
Build CPU features: apic clfsh cmov cx8 lahf_lm mmx msr pse rdtscp sse2 sse3 ssse3
C compiler: C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin/x86_amd64/cl.exe MSVC 16.0.30319.1
C compiler flags: /arch:AVX /DWIN32 /D_WINDOWS /W3 /MD /O2 /Ob2 /D NDEBUG
C++ compiler: C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin/x86_amd64/cl.exe MSVC 16.0.30319.1
C++ compiler flags: /arch:AVX /DWIN32 /D_WINDOWS /W3 /GR /EHsc /wd4800 /wd4355 /wd4996 /wd4305 /wd4244 /wd4101 /wd4267 /wd4090 /MD /O2 /Ob2 /D NDEBUG
Boost version: 1.55.0 (internal)



++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

Changing nstlist from 10 to 20, rlist from 0.9 to 0.928

Input Parameters:
integrator = md-vv
tinit = 0
dt = 0.002
nsteps = 5000000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = 1993
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 1000
nstcalcenergy = 100
nstenergy = 1000
nstxout-compressed = 25000
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 20
ns-type = Grid
pbc = xyz
periodic-molecules = FALSE
verlet-buffer-tolerance = 0.005
rlist = 0.928
rlistlong = 0.928
nstcalclr = 10
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 0.9
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-switch
rvdw-switch = 0.75
rvdw = 0.9
DispCorr = No
table-extension = 1
fourierspacing = 0.1
fourier-nx = 64
fourier-ny = 64
fourier-nz = 64
pme-order = 4
ewald-rtol = 1e-005
ewald-rtol-lj = 1e-005
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
implicit-solvent = No
gb-algorithm = Still
nstgbradii = 1
rgbradii = 1
gb-epsilon-solvent = 80
gb-saltconc = 0
gb-obc-alpha = 1
gb-obc-beta = 0.8
gb-obc-gamma = 4.85
gb-dielectric-offset = 0.009
sa-algorithm = Ace-approximation
sa-surface-tension = 2.05016
tcoupl = Nose-Hoover
nsttcouple = 10
nh-chain-length = 10
print-nose-hoover-chain-variables = FALSE
pcoupl = No
pcoupltype = Isotropic
nstpcouple = -1
tau-p = 1
compressibility (3x3):
compressibility[ 0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
compressibility[ 1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
compressibility[ 2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
ref-p (3x3):
ref-p[ 0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
ref-p[ 1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
ref-p[ 2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
refcoord-scaling = No
posres-com (3):
posres-com[0]=0.00000e+000
posres-com[1]=0.00000e+000
posres-com[2]=0.00000e+000
posres-comB (3):
posres-comB[0]=0.00000e+000
posres-comB[1]=0.00000e+000
posres-comB[2]=0.00000e+000
QMMM = FALSE
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Shake
continuation = FALSE
Shake-SOR = FALSE
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = no
rotation = FALSE
interactiveMD = FALSE
disre = No
disre-weighting = Conservative
disre-mixed = FALSE
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
deform[ 1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
deform[ 2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
simulated-tempering = FALSE
E-x:
n = 0
E-xt:
n = 0
E-y:
n = 0
E-yt:
n = 0
E-z:
n = 0
E-zt:
n = 0
swapcoords = no
adress = FALSE
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
grpopts:
nrdf: 48414
ref-t: 300
tau-t: 0.8
annealing: No
annealing-npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0


And the stderr.txt gives this:

BOINC wrapper for GROMACS.
Arg 0 [projects/www.gpugrid.net/mdrun-502-902-avx-64.exe]
Arg 1 [--nthreads]
Arg 2 [8]
BOINC running with [8] threads
BOINC resolving [traj.xtc] to [traj.xtc]
BOINC resolving [topol.tpr] to [topol.tpr]
BOINC resolving [progress.log] to [progress.log]
BOINC resolving [state.cpt] to [state.cpt]
GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2

GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
Peter Tieleman Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2014, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2
Executable: D:\ProgramData\BOINC\slots\0\projects\www.gpugrid.net\mdrun-502-902-avx-64.exe
Library dir: C:\Program Files\Gromacs\share\gromacs\top
Command line:
mdrun-502-902-avx-64 -ntomp 8 -nt 8 -x traj.xtc -s topol.tpr -g progress.log -cpi state.cpt

Reading file topol.tpr, VERSION 4.6.1 (single precision)
Note: file tpx version 83, software tpx version 100
Changing nstlist from 10 to 20, rlist from 0.9 to 0.928



So... apparently it gets stuck running on 1 thread doing apparently nothing at all.
____________

boinc127
Send message
Joined: 31 Aug 13
Posts: 11
Credit: 7,952,212
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 38667 - Posted: 24 Oct 2014 | 19:47:15 UTC - in response to Message 38666.
Last modified: 24 Oct 2014 | 19:51:16 UTC

After resetting the project, same thing happened. It says "using 8 threads", yet it only uses one (13% CPU usage).
The progres.txt file doesn't update at all after this:


Excellent point... I couldn't remember how often the progress.log file updates after the program starts. Like workunits do for some other projects, I figured it has some preliminary work to do before started really crunching with all threads. I thought it was a problem with my computer. It doesn't seem like the program is spinning up enough threads.

And reviewing the older MT tasks I've crunched there is usually a note in stderr.txt or progress.log that mentions running 1 MPI thread and 4 OpenMP threads. I don't see the same note using the AVX program. Perhaps its something very simple like a missing command line argument?

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38668 - Posted: 24 Oct 2014 | 20:42:35 UTC - in response to Message 38664.
Last modified: 24 Oct 2014 | 20:48:45 UTC

Yes estimates are incorrect.
With AVX 8 threads- the task will take a total ~16hr.

I've been though this thread. I installed BOINC 7.4.22 and the estimated remaining time dropped dramatically.

I also Googled AVX. I guess my AMD FX 8350 has it, but do I need to do anything to activate it?

How many hours for AVX 4 threads? In 24 hours I only did 20%.



Note: These estimates are for Intel AVX. I'm unsure about AMD AVX CPUMD times.
AMD AVX instructions are computed differently than Intel's. AMD has more Integer execution ports than Floating Point.
FX modules share a AVX FP unit with threads- for every two integer core there is one 128bit AVX capable FP unit. To complete a 256bit AVX instruction: a second 128bit cycle is required. Whereas Intel Sandy/Ivy/Haswell has a 256bit AVX FP unit.

http://www.anandtech.com/show/5831/amd-trinity-review-a10-4600m-a-new-hope
http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/2
http://www.anandtech.com/show/6355/intels-haswell-architecture/8[/url]

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38669 - Posted: 24 Oct 2014 | 21:15:44 UTC - in response to Message 38667.

After resetting the project, same thing happened. It says "using 8 threads", yet it only uses one (13% CPU usage).
The progres.txt file doesn't update at all after this:


Excellent point... I couldn't remember how often the progress.log file updates after the program starts. Like workunits do for some other projects, I figured it has some preliminary work to do before started really crunching with all threads. I thought it was a problem with my computer. It doesn't seem like the program is spinning up enough threads.

And reviewing the older MT tasks I've crunched there is usually a note in stderr.txt or progress.log that mentions running 1 MPI thread and 4 OpenMP threads. I don't see the same note using the AVX program. Perhaps its something very simple like a missing command line argument?


The SSE2 WUs that I ran on this exact same machine updated its progress.txt showing the step number it was on every 10 seconds or so. This one though, seems to get stuck, so I'm pretty sure it's some kind of bug. The stderr.txt file does show that the WU is asking for 8 cores, yet it only "uses" 1, so it seems the bug is located after the initialization of the WU.
____________

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38670 - Posted: 24 Oct 2014 | 23:30:33 UTC - in response to Message 38669.
Last modified: 24 Oct 2014 | 23:31:40 UTC

Are CPUMD SSE2/4 tasks being sent to AVX hosts?

boinc127
Send message
Joined: 31 Aug 13
Posts: 11
Credit: 7,952,212
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 38671 - Posted: 25 Oct 2014 | 0:10:31 UTC - in response to Message 38670.

Are CPUMD SSE2/4 tasks being sent to AVX hosts?


I do not believe so. I have requested a few CPU test apps (over the past couple of days) and have always gotten the AVX ones. I suppose I could downgrade the BOINC client so it stops recognizing AVX, or set up a specific app_info.xml file, but I wouldn't have any idea what to put in it.

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38673 - Posted: 25 Oct 2014 | 6:34:35 UTC
Last modified: 25 Oct 2014 | 6:36:15 UTC

I keep getting CPU WUs (that don't work...) even though I unchecked the "Use CPU?" in the setting page for GPUGRID (along with unchecking everything BUT ACEMD LONG RUNS). If the AVX WU problem is not fixed or the whole not-obeying the settings thing, then I'm going to be forced to detach the project entirely (I don't want to play cop and manually abort the CPU WUs that are bugged, it defeats the whole set-it-and-forget-it purpose of BOINC...).
____________

Profile [VENETO] sabayonino
Send message
Joined: 4 Apr 10
Posts: 50
Credit: 645,641,596
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38676 - Posted: 25 Oct 2014 | 12:54:44 UTC - in response to Message 38643.
Last modified: 25 Oct 2014 | 12:57:20 UTC

does it work for pricese puppy 5.7? I do not get any WUs. Thanks.


Who is Princess Puppy?



I think Puppy Linux Precise 5.7.1
http://www.puppylinux.com/ and http://bkhome.org/blog2/?viewDetailed=00346 for precise

:D

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38678 - Posted: 25 Oct 2014 | 14:23:31 UTC
Last modified: 25 Oct 2014 | 14:29:37 UTC

Does anybody else have unusually high Task duration correction for GPUGRID since computing or downloading a MDCPU task?

Since I've completed or downloaded a CPUMD task-- GPUGRID Task duration correction factor (9.829525) has sky rocketed. Other Projects are have the correct factor number and task estimates are within normal ranges for CPU/GPU completion times.
Currently there is no CPUMD tasks running or in cache while factor has stayed the same. I just downloaded a new GPU task and the estimate time is at 370Hr.(All CPU/GPU task from GPUGRID are abnormal estimates for a week or so) 7.4.21 Client reverting task factor to the proper number isn't happening (re-setting/letting all tasks in cache finish/completing 10 GPU tasks since)
What is causing the continuation of high correction factors?

Also- Will future MD tasks have GPU support enabled?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38679 - Posted: 25 Oct 2014 | 14:25:22 UTC - in response to Message 38678.


Also- Will MD task in future have GPU support enabled?


Not in the short-to-medium term - the point of this application is to use CPU.
In the long term, it might support AMD GPUs, but don't quote me on that.

Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38680 - Posted: 25 Oct 2014 | 14:26:33 UTC

The buggy Windows AVX app is gone now. Please abort any instances of it still running. It's replaced with the working SSE2 app.

Speedy
Send message
Joined: 19 Aug 07
Posts: 43
Credit: 40,991,082
RAC: 809,640
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 38685 - Posted: 26 Oct 2014 | 3:32:29 UTC
Last modified: 26 Oct 2014 | 4:13:13 UTC

I am running the CPU application version 9.01 I noticed when I opened the progress text file it tells me I am running the following CPU


Build CPU brand: Intel(R) Core(TM) i3-2365M CPU @ 1.40GHz
however I am actually running and I 7 980 X the task is currently 62.9% complete. With another estimated 16 hours to go task name 1981-MJHARVEY_CPUDHFR-0-1-RND0908_2

Has anyone else noticed this? The reporting of wrong CPU.
Also could somebody please explain to me how the time is worked out in the progress file? E.g. after 3686000 steps it says under time 7372.00000

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38687 - Posted: 26 Oct 2014 | 5:56:31 UTC - in response to Message 38680.
Last modified: 26 Oct 2014 | 5:58:34 UTC

The buggy Windows AVX app is gone now. Please abort any instances of it still running. It's replaced with the working SSE2 app.

Working? Yes, but it stops one of my two GPU tasks:



I gave BOINC another CPU thread to play with and the waiting-to-run task restarted, but I immediately got another MJHARVEY, which is hardly likely to complete before its deadline...

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38688 - Posted: 26 Oct 2014 | 8:03:02 UTC

15 years ago, when I started BOINCing SETI, it was the case, and as far as I know still is, that a feature of BOINC was/is to use only spare cycles, by running at low priority.

Why do JMHARVEYs run at high priority?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,923,227,372
RAC: 18,784,623
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38689 - Posted: 26 Oct 2014 | 8:46:42 UTC - in response to Message 38688.

15 years ago, when I started BOINCing SETI, it was the case, and as far as I know still is, that a feature of BOINC was/is to use only spare cycles, by running at low priority.

Why do JMHARVEYs run at high priority?

Different usage and meaning of the word 'priority'.

In the first case, when SETI first started (long before the BOINC platform was created and opened up for other projects), 'priority' referred to the thread/process priority of the task running on the CPU - and it was (and remains) low by comparison to the other primary tasks running on the computer - writing documents, surfing the web, reading emails etc. etc.

In the second case - where you are seeing it displayed in BOINC Manager - the word priority merely refers to the relative processing order of the BOINC tasks in the queue: there is some urgency to run that particular task because the time estimate is suggesting that it might not be completed before deadline.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38691 - Posted: 26 Oct 2014 | 10:29:05 UTC - in response to Message 38689.

15 years ago, when I started BOINCing SETI, it was the case, and as far as I know still is, that a feature of BOINC was/is to use only spare cycles, by running at low priority.

Why do JMHARVEYs run at high priority?

Different usage and meaning of the word 'priority'.

In the first case, when SETI first started (long before the BOINC platform was created and opened up for other projects), 'priority' referred to the thread/process priority of the task running on the CPU - and it was (and remains) low by comparison to the other primary tasks running on the computer - writing documents, surfing the web, reading emails etc. etc.

In the second case - where you are seeing it displayed in BOINC Manager - the word priority merely refers to the relative processing order of the BOINC tasks in the queue: there is some urgency to run that particular task because the time estimate is suggesting that it might not be completed before deadline.


Many thanks for the clarification, Richard :)

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38693 - Posted: 26 Oct 2014 | 14:10:21 UTC

WU started 8.3 hours ago. It's done 5%. 8.3x20÷24=6.9 days to finish, two days after its deadline.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38694 - Posted: 26 Oct 2014 | 14:42:48 UTC - in response to Message 38693.
Last modified: 26 Oct 2014 | 15:33:21 UTC

Tomba- runtime estimates for MDCPU are incorrect. For SSE2/SSE4/AVX tasks you're AMD CPU completes a task in under 24hr with 8threads. For 4threads: 48-72hr.

Unless something changed with App 9.03- a progress file exists showing how many steps have been computed. The progress file is located in allotted slot for MDCPU. 5million total steps for each work unit. An update for amount of steps computed happens every ten or so minutes.

http://www.gpugrid.net/forum_thread.php?id=3898

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38695 - Posted: 26 Oct 2014 | 15:00:45 UTC - in response to Message 38694.

I've tweaked the estimated cost, but it'll take a while for the change to propagate.

Matt

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38699 - Posted: 26 Oct 2014 | 15:47:25 UTC - in response to Message 38694.

For 4 threads: ~48hr.

OK. I had aborted that WU and stopped any more, but because of the above I decided to have another go.

At that time BOINC use-at-most preference was at 62.5% = five CPU threads. Darn me if the next MJHARVEY grabbed six! That is not gentlemanly!

I aborted that WU, set the preference to 50%, and the next one grabbed four, but stopped one of my GPU tasks. I gave BOINC another 12.5% and the stopped GPU task resumed, but, lo and behold, I immediately got another MJHARVEY!! (See my post below).

More thought needed, methinks...






Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38700 - Posted: 26 Oct 2014 | 16:31:27 UTC - in response to Message 38699.



At that time BOINC use-at-most preference was at 62.5% = five CPU threads. Darn me if the next MJHARVEY grabbed six! That is not gentlemanly!



That's a real number to integer rounding problem. Might be able to fix that, depending on where the conversion's made.

Mjh

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38701 - Posted: 26 Oct 2014 | 17:23:17 UTC

Holy Moses! Just finished my dinner and checked. One of my GPU tasks had stopped!

Suspended the active MJHARVEY and the other one, which should never have been downloaded, started.

I've had enough nurse-maiding. I'm out of here. Sorry...

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38702 - Posted: 26 Oct 2014 | 18:04:30 UTC
Last modified: 26 Oct 2014 | 18:05:03 UTC

Tomba:

Please understand what is happening with the task scheduling, by reading this post:
http://www.gpugrid.net/forum_thread.php?id=3898&nowrap=true#38505

And if you feel the need to go into the web prefs to temporarily disable the GPUGrid MT CPU app, then by all means, feel free. It's obviously got some time estimation issues that are erroneously making them run as "high-priority" (earliest deadline first) mode, scheduled ahead of GPU jobs, which could interfere with your normal scheduling.

Kind regards,
Jacob

Speedy
Send message
Joined: 19 Aug 07
Posts: 43
Credit: 40,991,082
RAC: 809,640
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 38711 - Posted: 26 Oct 2014 | 23:05:33 UTC - in response to Message 38685.

I am running the CPU application version 9.01 I noticed when I opened the progress text file it tells me I am running the following CPU

Build CPU brand: Intel(R) Core(TM) i3-2365M CPU @ 1.40GHz
however I am actually running and I 7 980 X the task is currently 62.9% complete. With another estimated 16 hours to go task name 1981-MJHARVEY_CPUDHFR-0-1-RND0908_2

For those interested the above task finished much sooner than I expected. Finished in 16.4 hours run-time and a CPU time 193.15 CPU hours

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38712 - Posted: 26 Oct 2014 | 23:25:47 UTC - in response to Message 38685.

I am running the CPU application version 9.01 I noticed when I opened the progress text file it tells me I am running the following CPU

Build CPU brand: Intel(R) Core(TM) i3-2365M CPU @ 1.40GHz
however I am actually running and I 7 980 X the task is currently 62.9% complete. With another estimated 16 hours to go task name 1981-MJHARVEY_CPUDHFR-0-1-RND0908_2

Has anyone else noticed this? The reporting of wrong CPU.
Also could somebody please explain to me how the time is worked out in the progress file? E.g. after 3686000 steps it says under time 7372.00000


I think that CPU is the CPU used to compile the software...
In other words it might be MJH's CPU.
____________

Speedy
Send message
Joined: 19 Aug 07
Posts: 43
Credit: 40,991,082
RAC: 809,640
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 38713 - Posted: 27 Oct 2014 | 0:35:26 UTC - in response to Message 38712.

I am running the CPU application version 9.01 I noticed when I opened the progress text file it tells me I am running the following CPU

Build CPU brand: Intel(R) Core(TM) i3-2365M CPU @ 1.40GHz
however I am actually running and I 7 980 X the task is currently 62.9% complete. With another estimated 16 hours to go task name 1981-MJHARVEY_CPUDHFR-0-1-RND0908_2

Has anyone else noticed this? The reporting of wrong CPU.
Also could somebody please explain to me how the time is worked out in the progress file? E.g. after 3686000 steps it says under time 7372.00000


I think that CPU is the CPU used to compile the software...
In other words it might be MJH's CPU.

Thank you for the explanation, that would make sense. Interesting how the application doesn't get this information direct from Boinc. However I can understand how the above works

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38726 - Posted: 28 Oct 2014 | 17:33:18 UTC - in response to Message 38702.
Last modified: 28 Oct 2014 | 17:34:30 UTC

Tomba:

Please understand what is happening with the task scheduling, by reading this post:
http://www.gpugrid.net/forum_thread.php?id=3898&nowrap=true#38505

Thanks for the post, Jacob. I checked that thread but must confess most of it went right over my head!

Yesterday I reactivated the new CPU WUs. There were none, but overnight I got an MJHARVEY. I was happy to see that, with BOINC given 50% of my CPUs, no GPU task had been suspended.

The MJHARVEY just completed in a little over 16 hours, having used four CPUs.

I was not a little disappointed to get a mere 922 credits for my PCs efforts vs. 12k-19k for my GPUs. Perhaps the difference is a measure of the relative importance to the project of these new CPU tasks?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38727 - Posted: 28 Oct 2014 | 17:56:20 UTC
Last modified: 28 Oct 2014 | 18:00:12 UTC

I encourage you to try to read through the thread one more time. Basically, when a task has a risk of not being able to meet a deadline unless it is given priority, BOINC will run it in "high-priority" mode.

And, if you read that post, paying special attention to the ordering, you will see that "high-priority CPU tasks" get scheduled BEFORE any "regular-priority GPU tasks".

So, if an MT task happens to go high-priority, then you can expect BOINC to only schedule up-to-1-CPU-worth of regular-priority GPU tasks. And, if *2* MT tasks go high-priority, then you can expect BOINC to not run any GPU tasks. I suspect that is the behavior that you saw.

Hope that helps,
Jacob

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38728 - Posted: 28 Oct 2014 | 18:33:07 UTC - in response to Message 38727.

I encourage you to try to read through the thread one more time. Basically, when a task has a risk of not being able to meet a deadline unless it is given priority, BOINC will run it in "high-priority" mode.

I guess priority is a dead duck for these WUs. The task I got here did not say "high priority", and the deadline was Jan 5!!

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38729 - Posted: 28 Oct 2014 | 18:54:23 UTC - in response to Message 38728.
Last modified: 28 Oct 2014 | 19:02:36 UTC

Runtime is excellent with 4 threads. As is performance--
(ns/day)
7.436
(hour/ns)
3.228
AVX or SSE2 task-- stderr text states: projects/www.gpugrid.net/mdrun-463-901-sse-32.exe

If task priority is a "dead duck" then a system with two or more GPU's won't need "nurse-maiding"!

Are more January 5 deadline MJH tasks available for testing?

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38730 - Posted: 28 Oct 2014 | 19:38:48 UTC - in response to Message 38729.

Runtime is excellent with 4 threads.

That's my trusty AMD FX-8350!

If task priority is a "dead duck" then a system with two or more GPU's won't need "nurse-maiding"!

In fact, it looks like Matt has fixed "That's a real number to integer rounding problem. Might be able to fix that, depending on where the conversion's made."

RobertKazan
Send message
Joined: 12 Feb 12
Posts: 7
Credit: 33,370,034
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 38778 - Posted: 1 Nov 2014 | 4:16:15 UTC

Why CPU MD running on does not show the percentage of the job?

is there AVX version of this app for new CPUS?

RobertKazan
Send message
Joined: 12 Feb 12
Posts: 7
Credit: 33,370,034
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 38779 - Posted: 1 Nov 2014 | 4:16:24 UTC

Why CPU MD running on does not show the percentage of the job?

is there AVX version of this app for new CPUS?

RobertKazan
Send message
Joined: 12 Feb 12
Posts: 7
Credit: 33,370,034
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 38780 - Posted: 1 Nov 2014 | 4:18:25 UTC

Excuse me! Some problems with Mozilla Browser.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38802 - Posted: 3 Nov 2014 | 18:32:37 UTC - in response to Message 38646.

65C is hardly an unreasonable operating temperature for a CPU.

Thanks for your responses, Matt & Jacob.

OK. I downloaded another WU and have been running for 2+ hours, with Core Temp running for most of that. Before the WU started, CPU temp was 40C and CPU fan was 2700rpm. The fan is now at 3500rpm and you can see the CPU temps here:



A bit worried about that 75C max...

Am I OK to continue, with the Thermal Radar temp well into the red??

Earlier today I noticed that, with four cores allocated to MJHARVEYs, my Thermal Radar temp had dropped from a steady 68C to 55C. Quite a surprise!!

I gave BOINC two more cores to play with (= six) and my temp is a steady 65C.

I did nothing to bring about this change....

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38818 - Posted: 5 Nov 2014 | 15:46:50 UTC

PLEASE fix the problem that allows GPUGrid to send CPU WUs even though I explicitly stated in the settings to not do so.
____________

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38819 - Posted: 5 Nov 2014 | 15:51:05 UTC - in response to Message 38818.

What are your exact settings?

Astiesan
Send message
Joined: 8 Jun 10
Posts: 3
Credit: 1,074,178,895
RAC: 3,390,509
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38841 - Posted: 6 Nov 2014 | 4:28:33 UTC

mdrun-463-901-sse-32 causes a soft system freeze occassionally when exiting active state into sleeping state i.e. screensaver off to on.

By soft system freeze, I mean that the start bar/menu (I do use start8, but it's confirmed to occur without this active as well), all parts of it are locked. Windows-R can bring up the Run menu, and I can use cmd and taskkill mdrun and the start menu itself will return to normalcy, however the bar will continue to be unresponsive. Killing explorer.exe to reset the start bar will result in a hard freeze requiring reboot. During the soft freeze, alt-tab and other windows will be VERY slow to respond until mdrun is killed, afterwards all other windows work fine, but the start bar is unusable and will force a reboot of the system.

I've found through further testing, that boincmgr is also wholly unresponsive during this as well. There doesn't seem to be a consistent cause, I've had it go a few days between doing this, and other times it happens literally every 3-5 minutes (which my screensaver is set for 3 minutes of idle).

There is nothing in the error logs at all, they are 0KB.

Any assistance or ideas in resolving this would be appreciated.

My system:
Windows 8.1 64-bit
i7 4790K @ stock
ASRock Z97-Extreme4
EVGA GTX 970 SC ACX @ stock 344.60 drivers
2x8GB HyperX Fury DDR3-1866 @ stock

I posted this in the other CPU WU thread, but it seems to have not been seen.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38844 - Posted: 6 Nov 2014 | 6:51:47 UTC - in response to Message 38818.

PLEASE fix the problem that allows GPUGrid to send CPU WUs even though I explicitly stated in the settings to not do so.

I fixed it by setting my PC's location to "School" and setting "Molecular Dynamics on CPU: no" in the school preferences.

Post to thread

Message boards : News : New CPU Application for testing

//