Advanced search

Message boards : News : New CPU work units

Author Message
Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38419 - Posted: 12 Oct 2014 | 11:53:16 UTC

I'm starting testing again on the "cpumd" multi-threaded CPU app. Please post observations here.

Matt

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38420 - Posted: 12 Oct 2014 | 12:22:54 UTC - in response to Message 38419.
Last modified: 12 Oct 2014 | 12:26:31 UTC

14/10/12 08:21:16 | GPUGRID | No tasks are available for Test application for CPU MD

14/10/12 08:22:31 | GPUGRID | No tasks are available for Test application for CPU MD

14/10/12 08:23:55 | GPUGRID | No tasks are available for Test application for CPU MD

14/10/12 08:28:16 | GPUGRID | No tasks are available for the applications you have selected.

Are CPUMD available now?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38421 - Posted: 12 Oct 2014 | 12:36:15 UTC - in response to Message 38420.

Yes, there are a few hundred there.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38422 - Posted: 12 Oct 2014 | 12:59:54 UTC - in response to Message 38421.
Last modified: 12 Oct 2014 | 13:00:37 UTC

All set- received 8 tasks.
The CPUMD batch a few months back included a file that gave information about individual atoms ( I can't remember the name of file) - now I only see in progress file the energies (Bond Angle/ Proper Dih./Improper Dih./ Coulomb-14/LJ (SR)/ Coulomb (SR)/ Coul. recip./Potential/ Kinetic En./ Total Energy /Conserved En./ Temperature Pressure (bar)) for every thousand steps along with the input parameters.

Acceleration most likely to fit this hardware: AVX_256
Acceleration selected at GROMACS compile time: SSE2

Is AVX available? Or there no speed up with AVX?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38423 - Posted: 12 Oct 2014 | 13:05:49 UTC - in response to Message 38422.

Only SSE2 at the moment. Will probably make builds with higher levels of optimisation later but for now I'm concerned about correctness rather than performance.

Matt

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,594,819,747
RAC: 18,439,373
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38424 - Posted: 12 Oct 2014 | 13:12:16 UTC
Last modified: 12 Oct 2014 | 13:16:20 UTC

Matt,

I was able to get a task on my Windows box which only runs CPU tasks. That task is currently under way.

On my Linux box, I run GPU tasks most of the time. When I tried to pull a CPU task, the event log says that the computer has reached a limit on tasks in progress, even when there are no CPU tasks downloaded. I'm guessing that it has reached a limit on GPU tasks in progress so it won't pull anything else down.

If you need more information, please let me know.

Edit:

I just changed my device profile to only pull CPU tasks and now it says that there are no CPU tasks available. Maybe it was out earlier and gave the misleading message.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38425 - Posted: 12 Oct 2014 | 13:17:02 UTC

Hi Matt,

rigs are ready to accept beta app and CPU tasks but nothing coming in yet.

Message:
10/12/2014 3:12:37 PM | GPUGRID | Sending scheduler request: To fetch work.
10/12/2014 3:12:37 PM | GPUGRID | Requesting new tasks for CPU
10/12/2014 3:12:39 PM | GPUGRID | Scheduler request completed: got 0 new tasks
10/12/2014 3:12:39 PM | GPUGRID | No tasks sent
10/12/2014 3:12:39 PM | GPUGRID | No tasks are available for ACEMD beta version
10/12/2014 3:12:39 PM | GPUGRID | No tasks are available for Long runs (8-12 hours on fastest card)
10/12/2014 3:12:39 PM | GPUGRID | No tasks are available for CPU only app
10/12/2014 3:12:39 PM | GPUGRID | No tasks are available for Test application for CPU MD
10/12/2014 3:12:57 PM | GPUGRID | work fetch suspended by user

However there are some tasks, according to server page. Could be a BOINC thing though.

____________
Greetings from TJ

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38428 - Posted: 12 Oct 2014 | 15:15:48 UTC - in response to Message 38424.
Last modified: 12 Oct 2014 | 15:16:22 UTC

exapower - is it using all CPU cores?

Matt

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38432 - Posted: 12 Oct 2014 | 16:08:27 UTC
Last modified: 12 Oct 2014 | 16:10:45 UTC

Hi Matt,

Win7 x64, BOINC 7.2.42. I have got one and it runs on 5 CPU's. I have set in BOINC to use maximum 70% of CPU's (have 8 threads). So this is correct as two are working on GPUGRID LR's.
So it works! 1.5% done in 3.20 minutes on i7-4774 3.50GHz.

However CPU usage to task manager is 100%. mdrun.846,exe is using 60 or 61% CPU acoording to task mamager.

If you need/want more information please let me know and I will provide it.
____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38436 - Posted: 12 Oct 2014 | 16:36:38 UTC - in response to Message 38428.
Last modified: 12 Oct 2014 | 16:44:33 UTC

exapower - is it using all CPU cores?

Matt


Yes- Working flawlessly- CPUMD running at consist 92-95% (I have an audio program running along with a total of 17 background process/19 windows processes) in task manager- with program HWiNFO64 [2] physical [2} logical are always utilizing 96-98%. The program SIV64X reads BOINC usage for each core(thread)- showing 98% for CPUMD.

4hours of processing- BOINC @ 56% of task progress.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38440 - Posted: 12 Oct 2014 | 20:17:36 UTC

Observation about steps: This is from stderr file- Reading file topol.tpr, VERSION 4.6.1 (single precision)
Using 1 MPI thread
Using 4 OpenMP threads
starting mdrun 'Protein in water'
5000000 steps, 10000.0 ps.

And in progress file where input parameters are: nsteps= 5000000

Yet the Progress file reads 740000 step done so far- with BOINC task progress at 79.260%. Are the total number of steps 1million instead of 5million?

Killersocke
Send message
Joined: 18 Oct 13
Posts: 53
Credit: 406,647,419
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38442 - Posted: 12 Oct 2014 | 22:44:21 UTC - in response to Message 38419.

Task: 1315-MJHARVEY_CPUDHFR-0-1-RND2531_0

This Test Application is anounced with a runtime round about 2,5 hrs.

Now it's running over 5 hrs. WU is ready by 80% .

http://www.gpugrid.net/result.php?resultid=13198539

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38446 - Posted: 13 Oct 2014 | 6:50:40 UTC
Last modified: 13 Oct 2014 | 7:39:27 UTC

http://www.gpugrid.net/result.php?resultid=13196328

17hr runtime so far- Boinc @ 97% (been at this percentage for last few hours without moving) Progress file shows steps currently @ 1.5million. If task is 5million total steps- total runtime time for one work unit will be 56~hr. Is there way to increase deadline? I have 8 tasks downloaded. Or should I just boot them when close to being expired?

First results are in - runtime 170-400Hr. ~20hr* each core. 8threads=160hr 16threads=320hr. Will CPUMD tasks be in performance tab? There certainly enough of them to compare.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38449 - Posted: 13 Oct 2014 | 8:38:02 UTC - in response to Message 38446.

Hi,

That's a shame, those run times are rather longer than I'd anticipated. Probably have to dial down the length, though they are already close to the usable minimum.

Yes, they'll appear in the performance tab in due course.

Matt

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38451 - Posted: 13 Oct 2014 | 9:55:52 UTC
Last modified: 13 Oct 2014 | 10:04:21 UTC

Long runtimes are not bothersome to me- BOINC not reliable estimating runtimes. If steps are bare minimum already- Is there a way to split tasks in half- where different users receive a half?

BTW- HFR is a great research choice- each chain has many Biological process- positive regulation of T cell mediated cytotoxicity-antigen processing and presentation of peptide antigen via MHC class I- regulation of defense response to virus by virus along with many others.

What is specifically rendered in work units? (how I miss the old file with atom/bonds description) Stderr has "starting mdrun 'Protein in water'"
A few websites to understand type of research being done here not all specific to this work unit.
http://www.ebi.ac.uk/pdbe-srv/view/entry/1a6z/summary

http://www.ebi.ac.uk/pdbe-srv/view/entry/1de4/summary

http://www.rcsb.org/pdb/gene/B2M

http://amigo.geneontology.org/amigo/term/GO:0019882

Profile [VENETO] sabayonino
Send message
Joined: 4 Apr 10
Posts: 50
Credit: 645,641,596
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38452 - Posted: 13 Oct 2014 | 10:23:13 UTC - in response to Message 38446.
Last modified: 13 Oct 2014 | 10:23:35 UTC

http://www.gpugrid.net/result.php?resultid=13196328

17hr runtime so far- Boinc @ 97% (been at this percentage for last few hours without moving) Progress file shows steps currently @ 1.5million. If task is 5million total steps- total runtime time for one work unit will be 56~hr. Is there way to increase deadline? I have 8 tasks downloaded. Or should I just boot them when close to being expired?

First results are in - runtime 170-400Hr. ~20hr* each core. 8threads=160hr 16threads=320hr. Will CPUMD tasks be in performance tab? There certainly enough of them to compare.



I confirm this

http://www.gpugrid.net/results.php?userid=58967&offset=0&show_names=0&state=0&appid=27

after 14-17 Hours stops at 98% without moving on ...

aborted all Wus

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38459 - Posted: 13 Oct 2014 | 14:06:45 UTC

I fired up an extra rig yesterday evening, with no usable GPU's for here but an older i7.
What I now see on three rigs doing the CPU, (two use 5 threads, one 4 threads) is that the first 99% gets done in about 17 hours and the last 1% takes a way more. At the older i7 it is now 20 hours running (0.300% to go) and the others are 22 hours running and not yet finished. I will let them run off course, but wouldn't it be better for a GPU to handle this?
____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38460 - Posted: 13 Oct 2014 | 14:33:20 UTC - in response to Message 38459.
Last modified: 13 Oct 2014 | 14:34:33 UTC

I fired up an extra rig yesterday evening, with no usable GPU's for here but an older i7.
What I now see on three rigs doing the CPU, (two use 5 threads, one 4 threads) is that the first 99% gets done in about 17 hours and the last 1% takes a way more. At the older i7 it is now 20 hours running (0.300% to go) and the others are 22 hours running and not yet finished. I will let them run off course, but wouldn't it be better for a GPU to handle this?


TJ- in the progress file- how many steps so far? I've been at 98% for last 7 hours running 4 threads with about 2.5million steps left out 5mil total.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38462 - Posted: 13 Oct 2014 | 15:04:23 UTC - in response to Message 38460.

I fired up an extra rig yesterday evening, with no usable GPU's for here but an older i7.
What I now see on three rigs doing the CPU, (two use 5 threads, one 4 threads) is that the first 99% gets done in about 17 hours and the last 1% takes a way more. At the older i7 it is now 20 hours running (0.300% to go) and the others are 22 hours running and not yet finished. I will let them run off course, but wouldn't it be better for a GPU to handle this?


TJ- in the progress file- how many steps so far? I've been at 98% for last 7 hours running 4 threads with about 2.5million steps left out 5mil total.

eXaPower, there is a lot of information in the progress file, but I don't see it. Or better I don't know where to look. Can you give me a hint?
____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38464 - Posted: 13 Oct 2014 | 15:58:04 UTC - in response to Message 38462.

Stroll down to very end of file- this where most current step is- Every ten minutes a checkpoint is created.

Every thousand steps should show
[list=] Step Time Lambda
1680000 3360.00000 0.00000

Energies (kJ/mol)
Bond Angle Proper Dih. Improper Dih. LJ-14
1.98361e+003 5.27723e+003 6.64863e+003 3.42704e+002 2.23267e+003
Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential
2.89505e+004 3.73032e+004 -3.83508e+005 3.47532e+003 -2.97294e+005
Kinetic En. Total Energy Conserved En. Temperature Pressure (bar)
6.02571e+004 -2.37037e+005 -2.60930e+005 2.99386e+002 -3.39559e+002[/list]

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38466 - Posted: 13 Oct 2014 | 17:56:19 UTC
Last modified: 13 Oct 2014 | 17:57:41 UTC

Found it, I have some data

Step Time Lambda
4891000 9782.00000 0.00000
(99.895% done, 5CPU, 24:23:20h)

Step Time Lambda
3856000 7712.00000 0.00000
(99.887% done, 5CPU, 23:53:20h)

Step Time Lambda
2938000 5876.00000 0.00000
(97.075% done, 4CPU, 18:51:13h)

So will take more time to finish. At one rig I got 4 more, they will not meet the deadline of 17 October.
____________
Greetings from TJ

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38467 - Posted: 13 Oct 2014 | 18:56:44 UTC

Finally got the first one finished in 24h55m!
____________
Greetings from TJ

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38472 - Posted: 13 Oct 2014 | 20:29:02 UTC - in response to Message 38451.

exapower,

Actually it's dihydrofolate reductase http://en.wikipedia.org/wiki/Dihydrofolate_reductase

Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38473 - Posted: 13 Oct 2014 | 20:31:04 UTC - in response to Message 38467.

TJ - odd that it ran with 5 threads -- do you have a CPU limit set ?

Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38474 - Posted: 13 Oct 2014 | 20:56:11 UTC

For the WUs going on tonight I've increased the expected compute cost by 10x, and the deadline to 7 days.

Matt

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38477 - Posted: 13 Oct 2014 | 22:23:32 UTC - in response to Message 38472.
Last modified: 13 Oct 2014 | 23:23:17 UTC

Thanks for the correction. Gives me another enzyme to learn about. These CPUMD tasks for Ivy Bridge 4 threads looks to be ~48 hr runtime. My Westmere generation Pentium runtime is about 90~hr. (It really a Dual core Xeon 30TDP L3403) reason why- a 4.8GT/s QPI internal link and external DMI link. No westmere Pentium has QPI- only DMI links. Intel been rebadging Xeons for while now.

I just received a new task with 10x compute cost-- 1084hour estimated runtime! Old task estimated a 5hr runtime- while taking 48~hr to finish. Going to be very interestxing.
http://www.gpugrid.net/workunit.php?wuid=10165559

Edit*** Task http://www.gpugrid.net/workunit.php?wuid=10157944 is now running in high priority mode- kicking out one of my two GPU tasks off to waiting to run. All settings for Boinc have no limitations. Manual restarting of task and shutting off SLI has no affect. This just started to happen when task went into High priority mode after I downloaded new CPUMD task with high compute cost.

I aborted http://www.gpugrid.net/workunit.php?wuid=10165559 and GPU waiting to run task started back up. A new task http://www.gpugrid.net/workunit.php?wuid=10165641 downloaded and stopped GPU task again- boinc saying waiting to run state. Aborted task and second GPU starts task again.
What's changed from prior batch of CPUMD?
Having a new 10x compute cost task in cache suspends one of two GPU tasks running. If no 10x compute task are in cache- both GPU tasks run normal.
All this is happening while the older CPUMD task is running in high priority mode with two GPU SDOERR tasks and one 10x compute CPUMD task in cache.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38478 - Posted: 13 Oct 2014 | 22:25:22 UTC - in response to Message 38473.

TJ - odd that it ran with 5 threads -- do you have a CPU limit set ?

Matt

Yes it is at 70 so that 2 are free to feed the GPU's and 1 for the system to stay responsive so I can use this site and do some other things.

I have another rig set to 100 so the next one should run at 100% when the current one has finished.
____________
Greetings from TJ

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38481 - Posted: 14 Oct 2014 | 0:57:49 UTC
Last modified: 14 Oct 2014 | 1:08:25 UTC

Running one @ 8 threads (4 cores + HT).

EDIT: is there a way to let it use only 6 cores and free up two cores (so vLHC can ran as well...)
____________

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38482 - Posted: 14 Oct 2014 | 2:18:51 UTC
Last modified: 14 Oct 2014 | 2:23:16 UTC

Hmm... I got a task, which will correctly run at 6 of my 8 CPUs on my rig (since I'm using "Use at most 75% CPUs" to presently accommodate 2 VM tasks outside of BOINC):
http://www.gpugrid.net/workunit.php?wuid=10166037

BUT...

It immediately started running in "High Priority" "Earliest Deadline First" mode. Could this mean that the 1-week-deadline is too short? The estimated runtime is 1130+ hours, which is, uhh, 6.7 weeks? Does that sound right?

This configuration must be incorrect. The deadline should never be less than the expected initial runtime, right?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38484 - Posted: 14 Oct 2014 | 7:08:38 UTC - in response to Message 38482.

Yeah, no idea what went wrong with the runtime estimates. The last set got an estimate of 5h, but when I increased the cost by 10x the estimate went up 200x.

Matt

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38487 - Posted: 14 Oct 2014 | 9:41:02 UTC
Last modified: 14 Oct 2014 | 9:41:47 UTC

Task http://www.gpugrid.net/workunit.php?wuid=10157944 been @ 99.978 compete with an estimated 3 minute left for last 12hr with a current 2million steps to go before finish. This task been past 98% for about 25hr out 32hr running. The newer CPUMD tasks sitting in cache kills one of GPU tasks - keeping task in waiting to run mode. Once 10x compute cost task booted all is normal again.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38489 - Posted: 14 Oct 2014 | 12:38:52 UTC - in response to Message 38481.

Running one @ 8 threads (4 cores + HT).

EDIT: is there a way to let it use only 6 cores and free up two cores (so vLHC can ran as well...)

Yes that can be achieved, but not with the current one running.
Eight cores, so 12.5 per core. You want to use 6 cores thus 12.5x6=75.
So you have to set in BOINC Manager, Tools, Computing preferences, and then at the bottom: On multiprocessor systems, use at most and there you set it to 75%.

This works only for new WU's that not have been started. Once your current WU has finished, the next one will only use 75% thus 6 threads.
____________
Greetings from TJ

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38490 - Posted: 14 Oct 2014 | 12:47:54 UTC - in response to Message 38489.

I am thinking about hard-coding the CPU use to [number-of-CPU-cores] - [number-of-GPUs]. Opinions?

Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38491 - Posted: 14 Oct 2014 | 12:47:59 UTC - in response to Message 38489.

I am thinking about hard-coding the CPU use to [number-of-CPU-cores] - [number-of-GPUs]. Opinions?

Matt

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,594,819,747
RAC: 18,439,373
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38493 - Posted: 14 Oct 2014 | 12:56:51 UTC

Running one @ 8 threads (4 cores + HT).

EDIT: is there a way to let it use only 6 cores and free up two cores (so vLHC can ran as well...)


Another way to accomplish this is to set up an app_config.xml file like the following:

<app_config>
<app>
<name>android</name>
<max_concurrent>8</max_concurrent>
</app>
<app_version>
<app_name>cpumd</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>6</avg_ncpus>
<cmdline>--nthreads 6</cmdline>
</app_version>
</app_config>


The "avg_ncpus" parameter sets the number of threads reserved in the BOINC scheduler and the "__nthreads" parameter set the number of threads you want a task to use when the task runs.

The app_config.xml file goes into the /Boinc/projects/www.gpugird.net folder.

And just as a reminder, if you are using Windows, use Notepad to edit the app_config.xml file. Do not use Word as it put in extra formatting characters that confuses the xml interpreter. If you are using Ubuntu, use Gedit to edit the app_config.xml file.

That will leave 2 threads free for BOINC to schedule other tasks.

Sometimes you can make this effective by opening BOINC Manager and clicking on "Advanced" then "Read config files". In the message log, you should get a message that says something like app_config.xml found for www.gpugrid.net. You might need to shut down BOINC and start it back up again to make the app_config.xml effective. Then the parameters will apply to any new work that is downloaded after the parameters are effective.

Hope that helps.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38494 - Posted: 14 Oct 2014 | 12:57:59 UTC
Last modified: 14 Oct 2014 | 13:10:15 UTC

MJH:

Yeah, no idea what went wrong with the runtime estimates. The last set got an estimate of 5h, but when I increased the cost by 10x the estimate went up 200x.

Did you increase the <rsc_flops_bound> value appropriately? I believe it's used for task size, and hence task runtime estimation.

I am thinking about hard-coding the CPU use to [number-of-CPU-cores] - [number-of-GPUs]. Opinions?

I personally don't care either way, but I absolutely care that you make absolutely sure that the number of cores used (via commandline) matches the number of cores budgeted (via ncpus), so BOINC doesn't overcommit or undercommit the CPU. I hope that makes sense.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38495 - Posted: 14 Oct 2014 | 13:03:10 UTC - in response to Message 38491.

I am thinking about hard-coding the CPU use to [number-of-CPU-cores] - [number-of-GPUs]. Opinions?

Matt


Doesn't hurt to try. Only Nvidia GPU? How much would a GPU help with task time? If runtime is lowered by half or more then I'd say this is workable. Would be an option to run a task with half of GPU cores so maybe two of MD task can run at a time? Or is whole GPU required?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38496 - Posted: 14 Oct 2014 | 13:13:37 UTC

Side note:

Is it possible that progress % is not hooked up correctly?

For instance, progress.log says:

nsteps = 5000000

and if I scroll to the bottom:
Step Time Lambda 1254000 2508.00000 0.00000


... yet, the BOINC UI only says "1.777% done"
Shouldn't it say 25% done?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38497 - Posted: 14 Oct 2014 | 13:20:16 UTC - in response to Message 38496.

The app isn't reporting its progress to the client yes. It's just being estimated from the flopses.

Matt

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38498 - Posted: 14 Oct 2014 | 13:35:34 UTC

Is it possible to change it so that the app can report a better progress %?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38499 - Posted: 14 Oct 2014 | 13:41:13 UTC - in response to Message 38498.
Last modified: 14 Oct 2014 | 13:41:44 UTC

It's on the TODO list, yes. It'll be appearing in the Linux version first, as that's the easier to develop.

Matt

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38500 - Posted: 14 Oct 2014 | 14:00:58 UTC - in response to Message 38491.
Last modified: 14 Oct 2014 | 14:07:00 UTC

I am thinking about hard-coding the CPU use to [number-of-CPU-cores] - [number-of-GPUs]. Opinions?

Matt

For GPUGRID only crunchers this is great, but I think that those who do other project on the CPU are not that happy.
For me you can do it.

Edit: there are also projects that use the i-GPU, that is also a thread on the CPU.
____________
Greetings from TJ

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38501 - Posted: 14 Oct 2014 | 14:04:18 UTC

I have set to use 100% of the CPU's and only the CPU app for GPUGRID is in hte queque, but strangely only 5CPU's are used. Should be 8.
I noticed this in the progress file:

Detecting CPU-specific acceleration.
Present hardware specification:
Vendor: GenuineIntel
Brand: Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz
Family: 6 Model: 26 Stepping: 5
Features: apic clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
Acceleration most likely to fit this hardware: SSE4.1
Acceleration selected at GROMACS compile time: SSE2


Binary not matching hardware - you might be losing performance.
Acceleration most likely to fit this hardware: SSE4.1
Acceleration selected at GROMACS compile time: SSE2

Also I think as the estimation of run time is not correct yet the first 99% goes rather quick and then the last 1% take between 20-28 hours to finish. But Matt knows this already.

____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38502 - Posted: 14 Oct 2014 | 14:09:58 UTC - in response to Message 38423.
Last modified: 14 Oct 2014 | 14:15:32 UTC

TJ- from a post earlier.

Only SSE2 at the moment. Will probably make builds with higher levels of optimisation later but for now I'm concerned about correctness rather than performance.

Matt


TJ or Jacob- for you're SLI system- have you noticed any task being kicked out while running an older CPUMD task with a new x10 compute cost in cache? For me- with an old CPUMD running in high priority mode and new CPUMD in cache- one of two GPU tasks running go's into "waiting to run" mode.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38503 - Posted: 14 Oct 2014 | 14:28:19 UTC - in response to Message 38502.

TJ- from a post earlier.

[quote]Only SSE2 at the moment. Will probably make builds with higher levels of optimisation later but for now I'm concerned about correctness rather than performance.

Matt


Aha, thanks now I understand.

TJ or Jacob- for you're SLI system- have you noticed any task being kicked out while running an older CPUMD task with a new x10 compute cost in cache? For me- with an old CPUMD running in high priority mode and new CPUMD in cache- one of two GPU tasks running go's into "waiting to run" mode.


I have only "old" ones running and in queue, will let them finish first. However none is yet running at high priority.
____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38504 - Posted: 14 Oct 2014 | 14:37:06 UTC - in response to Message 38503.
Last modified: 14 Oct 2014 | 14:38:06 UTC

TJ- from a post earlier.

[quote]Only SSE2 at the moment. Will probably make builds with higher levels of optimisation later but for now I'm concerned about correctness rather than performance.

Matt


Aha, thanks now I understand.

TJ or Jacob- for you're SLI system- have you noticed any task being kicked out while running an older CPUMD task with a new x10 compute cost in cache? For me- with an old CPUMD running in high priority mode and new CPUMD in cache- one of two GPU tasks running go's into "waiting to run" mode.


I have only "old" ones running and in queue, will let them finish first. However none is yet running at high priority.


Currently- CPU task from first batch is NOT running in high priority- but when I download a "new" CPUMD 10x compute task- "old" task go's into high priority and kick's out one of two GPU task computing.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38505 - Posted: 14 Oct 2014 | 16:03:30 UTC

Ok, let's get a thing straight here. Client scheduling.

The order, I believe, goes something like this:
1) "High Priority" coprocessor (GPU/ASIC) tasks
2) "High Priority" CPU tasks (up to ncpus + 1) (MT tasks allowed to overcommit)
3) "Regular" coprocessor (GPU/ASIC) tasks (up to ncpus + 1)
4) "Regular" CPU tasks (up to ncpus + 1) (MT tasks allowed to overcommit)

So...
When one of the new GPUGrid MT CPU tasks comes in, if it is set to use all of the CPUs, and it run's high priority... It gets scheduled in "order 2", which is above the GPU tasks which come in at "order 3".

And then, it will additionally schedule as many "order 3" GPU tasks as it can, but only up to the point that it budgets 1 additional CPU. (So, if your GPU tasks are set to use 0.667 CPUs like I have scheduled mine via app_config, then it will run 1 GPU task, but not 2).

This is NOT a problem of "oh wow, GPUGrid MT tasks are scheduling too many CPUs."

This IS a problem of "oh wow, GPUGrid MT tasks go high-priority immediately. That throws off all of the scheduling on the client."

Hopefully that helps clarify.

PS: Here is some dated info that is a useful read:
http://boinc.berkeley.edu/trac/wiki/ClientSched
http://boinc.berkeley.edu/trac/wiki/ClientSchedOctTen

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38508 - Posted: 14 Oct 2014 | 16:55:05 UTC
Last modified: 14 Oct 2014 | 16:57:45 UTC

Jacob- thank you for the information about client scheduling.

Matt- I see you released a CPUMD app for Linux with support for SSE4/AVX. Will windows also see an upgrade? Do you have idea what the speed up with SSE4/ AVX app will be compared to standard SSE2 app?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38509 - Posted: 14 Oct 2014 | 17:20:49 UTC - in response to Message 38508.

Will windows also see an upgrade?


Probably within the week.


Do you have idea what the speed up with SSE4/ AVX app will be compared to standard SSE2 app?


10-30% for AVX on Intel, I think.

=Lupus=
Send message
Joined: 10 Nov 07
Posts: 10
Credit: 12,777,491
RAC: 0
Level
Pro
Scientific publications
watwatwatwat
Message 38522 - Posted: 14 Oct 2014 | 22:08:35 UTC

ohmyohmy...

http://www.gpugrid.net/result.php?resultid=13195959

running on 3 out of 4 cpu cores, nsteps=5000000

at 57 hours:

Writing checkpoint, step 3770270 at Tue Oct 14 23:57:52 2014

seems it will finish... in 24 more hours.

seems something went rly weird with est runtime. question: should I abort the 6 other workunits?

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38524 - Posted: 14 Oct 2014 | 23:50:27 UTC

Step = 1 744 000

After 9 hrs 20 min running on full 8 threads. This might be the most expensive (computing wise) WU I've run ever since I started DC'ing.

____________

boinc127
Send message
Joined: 31 Aug 13
Posts: 11
Credit: 7,952,212
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 38525 - Posted: 15 Oct 2014 | 4:08:20 UTC

I noticed this bit of info off of the task I ran, using 3 out of the 4 available cores on my computer:

http://www.gpugrid.net/result.php?resultid=13201370


Using 1 MPI thread
Using 3 OpenMP threads

NOTE: The number of threads is not equal to the number of (logical) cores
and the -pin option is set to auto: will not pin thread to cores.
This can lead to significant performance degradation.
Consider using -pin on (and -pinoffset in case you run multiple jobs).

Can this become an issue for computers that aren't running the task with all cores?

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38526 - Posted: 15 Oct 2014 | 14:46:03 UTC

This is the last bit of the stderr file:

starting mdrun 'Protein in water'
5000000 steps, 10000.0 ps (continuing from step 3283250, 6566.5 ps).

Writing final coordinates.

Core t (s) Wall t (s) (%)
Time: 32457.503 32458.000 100.0
9h00:58
(ns/day) (hour/ns)
Performance: 9.140 2.626

gcq#0: Thanx for Using GROMACS - Have a Nice Day

16:39:45 (4332): called boinc_finish(0)

It ran on 5 CPU's (8 where allowed). Am I right seeing that it took 9 hours to finish?
It took a bit more, see this:

1345-MJHARVEY_CPUDHFR-0-1-RND9787_0 10159887 153309 12 Oct 2014 | 17:58:49 UTC 15 Oct 2014 | 14:38:34 UTC Completed and validated 94,864.92 567,905.50 2,773.48 Test application for CPU MD v8.46 (mt)
____________
Greetings from TJ

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38529 - Posted: 15 Oct 2014 | 19:52:17 UTC - in response to Message 38526.


It ran on 5 CPU's (8 where allowed). Am I right seeing that it took 9 hours to finish?


No - it took just over a day. The performance was ~9ns/day, the sim was 10ns in length.

Matt

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38530 - Posted: 15 Oct 2014 | 21:25:42 UTC

Had 3 errors on one of my PCs:

http://www.gpugrid.net/results.php?hostid=185425

All errored out with:

"Program projects/www.gpugrid.net/mdrun.846, VERSION 4.6.3
Source code file: ..\..\..\gromacs-4.6.3\src\gmxlib\checkpoint.c, line: 1562

File input/output error:
Cannot read/write checkpoint; corrupt file, or maybe you are out of disk space?
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors"

This computer has no problems running other projects... including vLHC@Home, Rosetta, etc.
____________

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38534 - Posted: 16 Oct 2014 | 10:17:32 UTC

A few observations about CPUMD tasks- a dual core CPU will need 4 days/ 96hr~ to compete task- a dual core with HT [4threads] requires 2days/48~hr - a quad core with no HT [4threads] takes 16-36~hr - a quad core with HT [8threads] competes tasks in ~24hr - a 6 core [12threads] finishes a task in ~8-16hr - while a 16 thread CPU manages CPUMD tasks in under ~12hr. There are CPU finishing faster from being overclocked and having 1833MHz or higher RAM clocks. Disk usage is low for CPUMD- notice when running GPU tasks disk usage can be higher for certain tasks. (unfold_Noelia)

CPU temps are low with SSE2 app- when AVX CPUMD app is released temps will be higher. For people who are running Intel AVX CPU- there a possible 10-30% speed up coming when AVX app is released.

Some info-- http://en.wikipedia.org/wiki/Dihydrofolate_reductase

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38535 - Posted: 16 Oct 2014 | 11:26:37 UTC
Last modified: 16 Oct 2014 | 11:33:56 UTC

I completed my first CPU task on my main rig on Windows 10 Technical Preview x64.
http://www.gpugrid.net/result.php?resultid=13206356

Observations:
- It used app: Test application for CPU MD v8.46 (mtsse2)
- It had horrible estimates, along with an ability to report progress correctly, and had a 1-week deadline, and so it ran as high-priority the entire time, interfering with the BOINC client scheduling of my GPUGrid GPU tasks. I will not be running this type of task again unless the estimation is fixed.
- It did not report progress correctly.
- It ran using 6 (of my 8) logical CPUs, as I had BOINC set to use 75% CPUs, since I am running 2 RNA World VM tasks outside of BOINC
- It took 162,768.17s (45.2 hours) of wall time
- It consumed 721,583.90s (200.4 hours) of CPU time
- It did checkpoint every so often, which I was happy to see. It appeared to resume from checkpoints just fine.
- It completed successfully, with the output text below
- It validated successfully, and granted credit.
- It seems weird that the time values in the output do not match either the wall time or CPU time values that BOINC reported. Bug?

Core t (s) Wall t (s) (%) Time: 18736.176 18736.000 100.0 5h12:16 (ns/day) (hour/ns) Performance: 5.491 4.371


Let us know when the estimation and progress problems are fixed, and then maybe I'll run another one for you!

Thanks,
Jacob

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38537 - Posted: 16 Oct 2014 | 11:37:27 UTC - in response to Message 38535.
Last modified: 16 Oct 2014 | 11:54:54 UTC

Deleted post

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38539 - Posted: 16 Oct 2014 | 11:52:02 UTC - in response to Message 38537.
Last modified: 16 Oct 2014 | 12:03:43 UTC

eXaPower: Your questions about Windows 10 should have been a PM. I'll send you a PM response.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38544 - Posted: 16 Oct 2014 | 13:59:49 UTC

For last couple days- I've had two GPU tasks and one CPUMD tasks running in high priority- up until now all ran with no issues. Just now and randomly BOINC has decided to kill one of GPU tasks- sending it to "waiting to run" mode. If I suspend CPUMD task both GPU tasks will run. Allowing CPUMD task to run will shut a GPU task.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38545 - Posted: 16 Oct 2014 | 14:13:50 UTC - in response to Message 38544.
Last modified: 16 Oct 2014 | 14:17:59 UTC

For last couple days- I've had two GPU tasks and one CPUMD tasks running in high priority- up until now all ran with no issues. Just now and randomly BOINC has decided to kill one of GPU tasks- sending it to "waiting to run" mode. If I suspend CPUMD task both GPU tasks will run. Allowing CPUMD task to run will shut a GPU task.



Read here: http://www.gpugrid.net/forum_thread.php?id=3898&nowrap=true#38505

It's not random.

When your GPU tasks switched out of "high priority" (deadline panic) mode, they also became lower on the food chain of client task scheduling. Instead of order 1 (where they were scheduled before the MT task) they became order 3 (scheduled after the MT task). And then, since the scheduler will only schedule up to ncpus+1, that is why only 1 GPU task is presently scheduled, instead of both (assuming each of your GPU tasks is budgeted to use 0.5 or more CPU also).

Not random at all. Working as designed, correctly...
... given the circumstances of the GPUGrid MT task estimates being completely broken.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38547 - Posted: 16 Oct 2014 | 15:15:09 UTC - in response to Message 38545.
Last modified: 16 Oct 2014 | 15:23:12 UTC

For last couple days- I've had two GPU tasks and one CPUMD tasks running in high priority- up until now all ran with no issues. Just now and randomly BOINC has decided to kill one of GPU tasks- sending it to "waiting to run" mode. If I suspend CPUMD task both GPU tasks will run. Allowing CPUMD task to run will shut a GPU task.



Read here: http://www.gpugrid.net/forum_thread.php?id=3898&nowrap=true#38505

It's not random.

When your GPU tasks switched out of "high priority" (deadline panic) mode, they also became lower on the food chain of client task scheduling. Instead of order 1 (where they were scheduled before the MT task) they became order 3 (scheduled after the MT task). And then, since the scheduler will only schedule up to ncpus+1, that is why only 1 GPU task is presently scheduled, instead of both (assuming each of your GPU tasks is budgeted to use 0.5 or more CPU also).

Not random at all. Working as designed, correctly...
... given the circumstances of the GPUGrid MT task estimates being completely broken.


Jacob- one GPU task been running for 37hr straight in high priority mode- one GPU task for 22hr straight high priority and one CPUMD task for 24 straight hours high priority mode. During this time I haven't added any task to cache- If all three task were already in high priority (Order 1 or 3/is there a way to find out which?)mode running- why did BOINC kick one out after all this time? Since very beginning these three tasks have been in High priority and I haven't changed any BOINC scheduler or allowed CPU usage. I had a similar issue when a CPUMD task was in cache- so I've stopped allowing any task to sit in cache- only keeping tasks capable of computing on available GPU/CPU.

If I suspend CPUMD task- both GPU task will run with one being in High priority and other not. If I suspend CPUMD task one GPU that in high Priority changes to non-high priority. When CPUMD task is running along side one GPU task- when the task that's in waiting to run is suspended - the GPU task running stops high priority mode.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38548 - Posted: 16 Oct 2014 | 15:27:44 UTC - in response to Message 38547.
Last modified: 16 Oct 2014 | 15:34:59 UTC

"High priority mode" for a task means that "Presently, if tasks were scheduled in a FIFO order in the round-robin scheduler, the given task will not make deadline. We need to prioritize it to be ran NOW." It should show you, in the UI, if the task is in "High Priority" mode, on that Tasks tab, in the Status column.

A task can move out of "High priority mode" when the round-robin simulation indicates that it WOULD make deadline. When tasks are suspended/resumed/downloaded, when progress percentages get updated, when running estimates get adjusted (as tasks progress), when the computers on_frac and active_frac and gpu_active_frac values change ... the client re-evaluates all tasks to determine which ones need to be "High priority" or not.

Did you read the information in the links that were in my post? They're useful. After reading that information, do you still think the client scheduler is somehow broken?

Also, you can turn on some cc_config flags to see extra output in Event Log... specifically, you could investigate rr_simulation, rrsim_detail, cpu_sched, cpu_sched_debug, or coproc_debug. I won't be able to explain the output, but you could probably infer the meaning of some of it.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38549 - Posted: 16 Oct 2014 | 17:49:30 UTC - in response to Message 38548.
Last modified: 16 Oct 2014 | 18:10:23 UTC

Some cc_config flags information- BOINC thinks I'm going to miss deadline for CPUMD task----
(1138hr remaining estimate/14/10/16 13:34:52 | GPUGRID | [cpu_sched_debug] 5146-MJHARVEY_CPUDHFR-0-1-RND3131_0 sched state 2 next 2 task state 1) Boinc says CPUMD is 20% compete in 24hr--progress file is at 3.5million step )

BOINC will run unfold Noelia task (97%compete/18hr est remaining/14/10/16 13:33:52 | GPUGRID | [cpu_sched_debug] unfoldx5-NOELIA_UNFOLD-19-72-RND4631_0 sched state 2 next 2 task state 1) in High priority when CPUMD task is running while booting the task Boinc thinks will miss a deadline-- 63% compete SDOERR task (174hr remaining estimate) (SDOERR)14/10/16 13:33:52 | GPUGRID | [cpu_sched_debug] I1R119-SDOERR_BARNA5-38-100-RND1580_0 sched state 1 next 1 task state 0

Here some newer tasks states that have changed---14/10/16 13:43:13 | GPUGRID | [cpu_sched_debug] 5146-MJHARVEY_CPUDHFR-0-1-RND3131_0 sched state 1 next 1 task state 0

14/10/16 13:47:13 | GPUGRID | [cpu_sched_debug] unfoldx5-NOELIA_UNFOLD-19-72-RND4631_0 sched state 2 next 2 task state 1

14/10/16 13:47:13 | GPUGRID | [cpu_sched_debug] I1R119-SDOERR_BARNA5-38-100-RND1580_0 sched state 2 next 2 task state 1

14/10/16 13:56:05 | GPUGRID | [rr_sim] 24011.34: unfoldx5-NOELIA_UNFOLD-19-72-RND4631_0 finishes (0.90 CPU + 1.00 NVIDIA GPU) (721404.58G/30.04G)

14/10/16 14:00:07 | GPUGRID | [rr_sim] 4404370.74: 5146-MJHARVEY_CPUDHFR-0-1-RND3131_0 finishes (4.00 CPU) (54297244.54G/12.33G)

14/10/16 13:56:05 | GPUGRID | [rr_sim] 658381.65: I1R119-SDOERR_BARNA5-38-100-RND1580_0 finishes (0.90 CPU + 1.00 NVIDIA GPU) (19780638.18G/30.04G)
14/10/16 13:56:05 | GPUGRID | [rr_sim] I1R119-SDOERR_BARNA5-38-100-RND1580_0 misses deadline by 348785.46
14/10/16 13:58:05 | GPUGRID | [cpu_sched_debug] skipping GPU job I1R119-SDOERR_BARNA5-38-100-RND1580_0; CPU committed

14/10/16 13:59:05 | GPUGRID | [cpu_sched_debug] unfoldx5-NOELIA_UNFOLD-19-72-RND4631_0 sched state 2 next 2 task state 1

14/10/16 13:59:05 | GPUGRID | [cpu_sched_debug] I1R119-SDOERR_BARNA5-38-100-RND1580_0 sched state 1 next 1 task state 0

14/10/16 13:59:05 | GPUGRID | [cpu_sched_debug] 5146-MJHARVEY_CPUDHFR-0-1-RND3131_0 sched state 2 next 2 task state 1

Now the three tasks are all running with new task states after being rescheduling ( I downloaded a new Long task)---
14/10/16 14:10:40 | GPUGRID | [cpu_sched_debug] unfoldx5-NOELIA_UNFOLD-19-72-RND4631_0 sched state 2 next 2 task state 1

14/10/16 14:10:40 | GPUGRID | [cpu_sched_debug] I1R119-SDOERR_BARNA5-38-100-RND1580_0 sched state 2 next 2 task state 1

14/10/16 14:10:40 | GPUGRID | [cpu_sched_debug] 5146-MJHARVEY_CPUDHFR-0-1-RND3131_0 sched state 2 next 2 task state 1

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38585 - Posted: 20 Oct 2014 | 12:14:34 UTC
Last modified: 20 Oct 2014 | 12:16:48 UTC

CPUMD tasks completed past deadline: credit is rewarded.

http://www.gpugrid.net/workunit.php?wuid=10159833

http://www.gpugrid.net/workunit.php?wuid=10158842

d_a_dempsey
Send message
Joined: 18 Dec 09
Posts: 6
Credit: 1,026,758,041
RAC: 381,866
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38632 - Posted: 22 Oct 2014 | 13:03:58 UTC

I have a problem with the Test application for CPU MD work units. This is obviously a test setup, according to both application name and this discussion thread, and the work units are being pushed to my machines even though my profile is set to not receive WUs from test applications.

I'm happy to do GPU computing for you guys, but I'm not willing to let you take over complete machines for days. Please control your app to respect the "Run test applications?" setting in our profiles.

Thank you,

David

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38634 - Posted: 22 Oct 2014 | 13:44:00 UTC - in response to Message 38632.

Hm, sorry about that. Should only be going to machines opted in to test WUs.
I should point out the app is close to production - the main remaining problem with it is the ridiculous runtime estimates the client is inexplicably generating.

Matt

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38724 - Posted: 28 Oct 2014 | 11:57:57 UTC - in response to Message 38634.

Are the working SSE2 CPUMD tasks on vacation? Were return results incomplete/invalid? 10000 tasks disappeared.
From the look of BOINC stats and GPUGRID graphs- a decent amount of new user CPU only machines were added with credit rewarded.

sis651
Send message
Joined: 25 Nov 13
Posts: 66
Credit: 193,925,538
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwat
Message 38731 - Posted: 28 Oct 2014 | 22:22:33 UTC

I got some CPU works to test but I had a problem with them. Currently I'm crunching some AVX units and crunched non AVX/SSE2 units before.
My problem is when I paused the units and restarted the Boinc none of the CPU works resume crunching from their last progress. They start crunching from the beginning. In an area with short but frequent blackouts its not possible to run these CPU units.

boinc127
Send message
Joined: 31 Aug 13
Posts: 11
Credit: 7,952,212
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 38732 - Posted: 28 Oct 2014 | 23:59:14 UTC - in response to Message 38731.

I believe the project admins dumped the AVX mt program because of some flaws in it. When I ran the AVX program I also noticed the program never checkpointed.

from MJH on another post:

The buggy Windows AVX app is gone now. Please abort any instances of it still running. It's replaced with the working SSE2 app.


http://www.gpugrid.net/forum_thread.php?id=3812&nowrap=true#38680

For now at least, there are no other CPU beta workunits to test. I guess the project admins will revise and replace the workunits when they are ready and able to.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38737 - Posted: 29 Oct 2014 | 7:52:00 UTC - in response to Message 38731.

I got some CPU works to test but I had a problem with them. Currently I'm crunching some AVX units and crunched non AVX/SSE2 units before.


Make sure that the application executable that you are running has "sse2" in its name, not "avx". Manually delete the old AVX app binary from the project directory if necessary.

MJH

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38741 - Posted: 29 Oct 2014 | 12:57:50 UTC - in response to Message 38737.
Last modified: 29 Oct 2014 | 13:00:03 UTC

Received 5 abandoned 9.03 "AVX" tasks. All are computing SSE2 even with AVX app binary in directory- checkpoints are working- BOINC client progress reporting is still off.(@70% with 3.7million steps left to compute) Progress file is reporting steps computed properly.

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 38768 - Posted: 30 Oct 2014 | 14:30:22 UTC

Hola, Amigos en Barcelona!

No CPU tasks received: are there any available?

Thanks!

John

Astiesan
Send message
Joined: 8 Jun 10
Posts: 3
Credit: 1,061,086,798
RAC: 3,342,191
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38777 - Posted: 1 Nov 2014 | 0:38:29 UTC
Last modified: 1 Nov 2014 | 0:42:26 UTC

mdrun-463-901-sse-32 causes a soft system freeze occassionally when exiting active state into sleeping state i.e. screensaver off to on.

By soft system freeze, I mean that the start bar/menu (I do use start8, but it's confirmed to occur without this active as well), all parts of it are locked. Windows-R can bring up the Run menu, and I can use cmd and taskkill mdrun and the start menu itself will return to normalcy, however the bar will continue to be unresponsive. Killing explorer.exe to reset the start bar will result in a hard freeze requiring reboot. During the soft freeze, alt-tab and other windows will be VERY slow to respond until mdrun is killed, afterwards all other windows work fine, but the start bar is unusable and will force a reboot of the system.

There is nothing in the error logs.

Any assistance or ideas in resolving this would be appreciated.

My system:
Windows 8.1 64-bit
i7 4790K @ stock
ASRock Z97-Extreme4
EVGA GTX 970 SC ACX @ stock
2x8GB HyperX Fury DDR3-1866 @ stock

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38781 - Posted: 1 Nov 2014 | 10:20:52 UTC

I gave four cores of my AMD FX-8350 to the app. I've done four WUs, which all completed in a remarkably consistent time of just over 16 hours, with a-bit-mean 920 credits each.

I just checked the server status:



...and was a little surprised to see my 16 hours well under the minimum run time of 19.16 hours.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38782 - Posted: 1 Nov 2014 | 11:12:59 UTC - in response to Message 38781.
Last modified: 1 Nov 2014 | 11:26:59 UTC

A current CPUMD task is 2.5million steps - not 5million as prior tasks. Maybe this why credit rewarded is lesser? All four of tasks you completed were 2.5million steps.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38784 - Posted: 1 Nov 2014 | 12:09:50 UTC - in response to Message 38782.

A current CPUMD task is 2.5million steps - not 5million as prior tasks. Maybe this why credit rewarded is lesser? All four of tasks you completed were 2.5million steps.

I did complete this 5M-step WU on 24 October and got 3342 credits...

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38785 - Posted: 1 Nov 2014 | 12:17:08 UTC - in response to Message 38781.

Yes, the credit allocation is wrong - need to work out how to fix that.

Matt

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38786 - Posted: 1 Nov 2014 | 12:24:51 UTC - in response to Message 38785.

Yes, the credit allocation is wrong - need to work out how to fix that.

Matt

A fixed 2.5M per completion would be a nice 'n' easy solution ;)

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38787 - Posted: 1 Nov 2014 | 12:51:29 UTC
Last modified: 1 Nov 2014 | 12:58:05 UTC

I have completed 2 of the new (I think?) tasks, of application type "Test application for CPU MD v9.01 (mtsse2)", on my host (id: 153764), running 8 logical CPUs (4 cores hyperthreaded).

When I first got the tasks, I think the estimated run time was something like 4.5 hours. But then, after it completed the first task (which took way longer - it took 15.75 hours of run time), it realized it was wrong, and adjusted the estimated run times for the other tasks to be ~16 hours.

For each of the 2 completed tasks:
- Task size: 2.5 million steps
- Run Time: ~16.4 hours
- CPU Time: ~104 hours (My CPUs were slightly overcommitted by my own doing)
- Credit granted: ~3700

I will continue to occasionally run these, to help you test, especially when new versions come out.

Regards,
Jacob

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 13,920,977,393
RAC: 7,489,952
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38788 - Posted: 1 Nov 2014 | 14:02:57 UTC

I'm crunching some of these units in my dual processor 32/48 threads machines. They are sandy bridge (32 threads) and ivy bridge (48 threads) xeon based machines.

In the 32 thread machine it has been quite straightforward, it has finished the first unit in 3h5m executing CPU MD v9.02 (mtavx)with CPU kicking in at turbo speed (3,3 GHz). No other boinc project in execution.

In the 48 thread it has been a little bit funnier :), first units crashed all just at the beggining, reading the stddr I learnt that the gromacs application can not work well with over 32 threads, but it will try anyway, so launching with 46 threads available other two reserved two GPUGRID GPU units) ended in error (11 units in a row).

So, while I investigated how to setup an app_config.xml file for mt units, I reduced the % of available processors until it reached 32 and started another MT unit that this time executed properly and finished in something less than 3h.

Then, I copied the app_config.xml file in the GPUGRID folder, enabled again 46 threads and crossed my fingers. It worked fine, 1 MT task using 32 threads and rest of threads executing Rosseta units. Additionally 2 GPUGRID GPU tasks. This time it need about 3h10m which i think should be because of the overall load in the machine.

I'll execute some more units and report of more findings if noticeable.




TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38789 - Posted: 1 Nov 2014 | 15:17:58 UTC

I powered on my old workstation with two xeon's and two slow GTX 660's.
I have allowed BOINC to use all 8 cores, requested new work for GPUGRID and got 2 GPU WU's SR and one CPU. This CPU WU runs on 4 cores it says but in taskmaanager it actually used 92%. I don't mind as I allowed to use all cores, but I would have expect that it uses 6 cores, as there are 6 cores free. Two for the GPU WU's, so 8-2=6.

Am I thinking wrong here?
____________
Greetings from TJ

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,594,819,747
RAC: 18,439,373
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38894 - Posted: 13 Nov 2014 | 0:54:04 UTC

Matt,

Scheduling suggestion: One of my PC's has 16 threads. I just ran it dry so I could install Windows updates uninterrupted. When I started back up, the first project I allowed to download new CPU tasks was GPUGRID. It download 16 of the multi-thread tasks. Each task is scheduled to take 70 hours. Since the tasks run one at a time, it will take a while to work through 16 tasks at 70 hours per task.

My suggestion is that the number of tasks downloaded at one time is 2 or 3. The number of tasks downloaded should not equal to the number of threads on the machine. I think the BOINC default is to initially download the same number of CPU tasks as there are threads on the machine. That may need to be changed for multi-thread tasks.

Thanks for all the effort you put in.
captainjack

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38895 - Posted: 13 Nov 2014 | 3:26:21 UTC - in response to Message 38894.
Last modified: 13 Nov 2014 | 3:28:27 UTC

Matt,

Scheduling suggestion: One of my PC's has 16 threads. I just ran it dry so I could install Windows updates uninterrupted. When I started back up, the first project I allowed to download new CPU tasks was GPUGRID. It download 16 of the multi-thread tasks. Each task is scheduled to take 70 hours. Since the tasks run one at a time, it will take a while to work through 16 tasks at 70 hours per task.

My suggestion is that the number of tasks downloaded at one time is 2 or 3. The number of tasks downloaded should not equal to the number of threads on the machine. I think the BOINC default is to initially download the same number of CPU tasks as there are threads on the machine. That may need to be changed for multi-thread tasks.

Thanks for all the effort you put in.
captainjack


I am actually very familiar with BOINC Work Fetch.

Essentially, what it does is: You have 2 preferences, the "Min Buffer" and the "Additional Buffer".
- When BOINC doesn't have enough work to keep all devices busy for "Min Buffer", or has an idle device presently, it will ask projects for work.
- When it asks, it asks for "Enough work to fill the idle devices, plus enough work to saturate the devices for [Min Buffer + Additional Buffer] time.", properly taking into account that some tasks are MT and some aren't. It correctly asks for that amount, because it minimizes the RPC web calls to the projects.

When BOINC contacted GPUGrid, it likely worked correctly, to satisfy your cache settings. If you think otherwise, then turn on <work_fetch_debug>, abort all of the unstarted tasks, and then let work fetch run, then copy the Event Log data to show us what happened.

Feel free to turn on the <work_fetch_debug> flag to see what BOINC is doing during work fetch. http://boinc.berkeley.edu/wiki/Client_configuration

Regards,
Jacob

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,594,819,747
RAC: 18,439,373
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38903 - Posted: 13 Nov 2014 | 22:06:28 UTC

Jacob,

Per your suggestion, I aborted all tasks, disabled the app_config file, turned on the work_fetch_debug option, started BOINC, and allowed new GPUGRID tasks. It downloaded one task.

Then I aborted that task, enabled the app_config file, restarted BOINC and allowed new tasks. It downloaded one task.

Then I turned off the work_fetch_debug option, aborted the task, restarted BOINC, and allowed new tasks. It downloaded one task.

No idea why it downloaded 16 tasks at one time yesterday. Must have been sun spots or something like that. Anyway, it seems to be working today.

Thanks for the suggestion.
captainjack

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38904 - Posted: 13 Nov 2014 | 22:12:28 UTC - in response to Message 38903.
Last modified: 13 Nov 2014 | 22:13:45 UTC

Strange.
The only things I can think of, offhand, would be:
- maybe your local cache of work-on-hand had been much lower during the "16-task-work-fetch", as compared to the "1-task-work-fetch"
- maybe your cache settings ("Min buffer" and "Max additional buffer") were different between the fetches.

Anyway, I'm glad to hear it's working for you!

If you have any questions/problems related to work fetch, grab some <work_fetch_debug> Event Log data, and feel free to PM me. I am a work fetch guru -- I helped David A (the main BOINC designer) make sure work fetch works well across projects, resources (cpus, gpus, asics), task types (st single threaded, mt multi threaded), etc. The current BOINC 7.4.27 release does include a handful of work fetch fixes compared to the prior release. You should make sure all your devices are using 7.4.27.

Regards,
Jacob

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 755,434,080
RAC: 186,180
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39012 - Posted: 23 Nov 2014 | 19:49:32 UTC

This application appears to have problems restarting from a checkpoint -
I suspended it for a few days, then when I told it to resume, it gave a
computation error less than a second later.

Test application for CPU MD v9.01 (mtsse2)
http://www.gpugrid.net/result.php?resultid=13426589
http://www.gpugrid.net/workunit.php?wuid=10302711

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39018 - Posted: 24 Nov 2014 | 21:59:37 UTC - in response to Message 39012.

Robert, was LAIM on?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 13,920,977,393
RAC: 7,489,952
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39081 - Posted: 5 Dec 2014 | 20:43:58 UTC
Last modified: 5 Dec 2014 | 20:44:23 UTC

So, is it still a Test application? Not ready for science production yet?

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 755,434,080
RAC: 186,180
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39084 - Posted: 5 Dec 2014 | 23:51:06 UTC - in response to Message 39018.

Robert, was LAIM on?


What's LAIM? How do I tell if it's on?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,214,765,968
RAC: 1,002,217
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39085 - Posted: 6 Dec 2014 | 0:03:53 UTC - in response to Message 39084.

Robert, was LAIM on?

What's LAIM? How do I tell if it's on?

LAIM

ExpeditionHope
Send message
Joined: 1 Jun 14
Posts: 1
Credit: 12,837,497
RAC: 0
Level
Pro
Scientific publications
watwatwat
Message 39086 - Posted: 6 Dec 2014 | 3:15:25 UTC - in response to Message 39085.

LAIM stands for "Leave application in memory"(while suspended) setting in boinc client under disk and memory usage tab.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 755,434,080
RAC: 186,180
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39093 - Posted: 8 Dec 2014 | 0:51:59 UTC - in response to Message 39086.

LAIM stands for "Leave application in memory"(while suspended) setting in boinc client under disk and memory usage tab.


It's on. However, I may have installed some updates and rebooted while the workunit was suspended.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39309 - Posted: 26 Dec 2014 | 2:08:34 UTC

The CPU work units apparently use GROMACS 4.6, which has provisions for GPU acceleration also. Is that being planned?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39325 - Posted: 28 Dec 2014 | 17:03:00 UTC

It looks like the work units have now gone from 4 cores to 6 cores. It is possible that the difference is due to an increase in the number of cores I allowed in BOINC, but I think it is more likely to be a change in the work units themselves.

GPUGRID 9.03 Test application for CPU MD (mtavx) 73801-MJHARVEY_CPUDHFR2-0-1-RND3693_0 - (-) 6C

That is perfectly OK with me, and I am glad to find a project that uses AVX.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39328 - Posted: 28 Dec 2014 | 19:01:04 UTC - in response to Message 39325.

To answer my own question, it looks like it is due to the changes that I made in BOINC. It is now up to 8 cores with the latest WU download, though it seems to me that it was limited by something else when I first started, but that was a while ago.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39329 - Posted: 28 Dec 2014 | 21:01:36 UTC

I'm pretty sure that the thread count of the task is set either at time-of-download, or time-of-task-start... And it's based on the "Use at most X% of CPUs" setting.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 39555 - Posted: 21 Jan 2015 | 15:56:06 UTC - in response to Message 39081.

So, is it still a Test application? Not ready for science production yet?

MJH:
Is CPUMD MJHARVEY_CPUDHFR2 finished? Will a batch of new (test) CPUMD be available? Or CPUMD transitioning to (production)?

Jonathan Figdor
Send message
Joined: 8 Sep 08
Posts: 14
Credit: 425,295,955
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39890 - Posted: 30 Jan 2015 | 4:59:47 UTC - in response to Message 39555.

Bump. Are there new CPU workunits coming? Hope all is well with the project.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 39897 - Posted: 30 Jan 2015 | 9:55:39 UTC - in response to Message 39890.

The CPU work is temporarily in abeyance while we prepare a new application.
Check back later or, if you have an AMD GPU, please participate in testing the new app.

Matt

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39905 - Posted: 30 Jan 2015 | 13:15:17 UTC - in response to Message 39897.
Last modified: 30 Jan 2015 | 13:15:46 UTC

Fancy word!

a·bey·ance
əˈbāəns/
noun: abeyance
a state of temporary disuse or suspension.

https://www.google.com/search?q=define%3Aabeyance

PS: Please use simpler words :-p

Post to thread

Message boards : News : New CPU work units

//