Advanced search

Message boards : Graphics cards (GPUs) : BOINC 6.4.5 released for Windows, Windows x64, Linux and Linux x64

Author Message
Blackbird74
Send message
Joined: 20 Nov 08
Posts: 3
Credit: 362,118
RAC: 0
Level

Scientific publications
watwatwatwatwat
Message 4235 - Posted: 10 Dec 2008 | 13:04:04 UTC

Didn't see a post about this so thought I should put one up.

Change Log:

- client: tweak CPU scheduling policy. When there's a coproc job:
Windows: don't saturate CPUs
Unix: saturate CPUs

- client: in round-robin simulation, remove code that sets CPU shortfall for projects with no active results.

This is now wrong because there coproc apps might have pending results. Also remove nidle_cpus > 0 conditional that increments CPU shortfall; I think this is vestigial code.

- client: include deviceOverlap and multiProcessorCount in XML for CUDA devices. They were mistakenly omitted.

- client: in round-robin simulation, don't count a project in total resource share if it has coproc jobs and no CPU jobs.

- MGR: fix the terms of use wizard page.

Original Post:
http://boinc.berkeley.edu/dev/forum_thread.php?id=2518&nowrap=true#21694

Download area:
http://boinc.berkeley.edu/download_all.php

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4236 - Posted: 10 Dec 2008 | 13:29:05 UTC - in response to Message 4235.

Didn't see a post about this so thought I should put one up.

Change Log:

- client: tweak CPU scheduling policy. When there's a coproc job:
Windows: don't saturate CPUs
Unix: saturate CPUs

- client: in round-robin simulation, remove code that sets CPU shortfall for projects with no active results.

This is now wrong because there coproc apps might have pending results. Also remove nidle_cpus > 0 conditional that increments CPU shortfall; I think this is vestigial code.

- client: include deviceOverlap and multiProcessorCount in XML for CUDA devices. They were mistakenly omitted.

- client: in round-robin simulation, don't count a project in total resource share if it has coproc jobs and no CPU jobs.

- MGR: fix the terms of use wizard page.

Original Post:
http://boinc.berkeley.edu/dev/forum_thread.php?id=2518&nowrap=true#21694

Download area:
http://boinc.berkeley.edu/download_all.php

I didn't make one because this one still has problems.

It appears with a change in the client the DCF is getting maxed out to 100, this started with 6.4.3. What happens is this cause the cleint to think that every GPUGRID task is going to take way longer than it does. The 4 day deadline, to the client is too short, and it runs the task in high priority, not fetching more work either.

I can track back in my backups on this host to 6.4.2 and it has a DCF of through the versions upgraded as 100,100,100 and then 1.317483
My other two hosts still running 6.4.2 have DCF's of 1.107852 and 1.23629
This pretty much eliminates the application and points to the client version.

It did seem to be running max tasks again, and I had to (for windows) set my processor percentage back to 50% so as to have one dedicated CPU for GPUGIRD, otherwise the cpu usage drops and the gpu elapsed time goes up.

@GDF
You need to set the CPU USAGE in the different applications for this, Set Windows to CPU=1.0 and set linux to some low number such as CPU=0.02 since that is what users say linux uses. This way linux users can run max cpus + 1 gpugrid without penalty or having to use ncpus+1 and Windows users can be have a dedicated cpu for gpugrid without have to set processors to 1 less, and if gpugrid runs out of work, they can use the processor for a cpu task instead of it being idle. I would think you can do separate templates for each version o/s to account for this, i'm guessing that is where you adjust that factor. If not, contact David and ask how to do it. He is aware this is how it should be, adjusted on the project and not in the client, so one client can run both ways.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4237 - Posted: 10 Dec 2008 | 15:06:34 UTC - in response to Message 4236.

we are working on improving the Windows speed with Nvidia.
They have just sent some code to test.
g

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4238 - Posted: 10 Dec 2008 | 15:39:04 UTC - in response to Message 4237.

we are working on improving the Windows speed with Nvidia.
They have just sent some code to test.
g

That just happens to be my Christmas wish this year.

Profile [AF>HFR>RR] Jim PROFIT
Send message
Joined: 3 Jun 07
Posts: 107
Credit: 31,331,137
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 4239 - Posted: 10 Dec 2008 | 15:46:37 UTC - in response to Message 4237.

we are working on improving the Windows speed with Nvidia.
They have just sent some code to test.
g


Maybe the DCF problem will be solve.

Jim PROFIT

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4241 - Posted: 10 Dec 2008 | 17:57:04 UTC - in response to Message 4239.

we are working on improving the Windows speed with Nvidia.
They have just sent some code to test.
g


Maybe the DCF problem will be solve.

Jim PROFIT

The DCF is part of BOINC, not NVIDIA.

Vid Vidmar*
Avatar
Send message
Joined: 27 Aug 08
Posts: 18
Credit: 1,146,374
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 4246 - Posted: 11 Dec 2008 | 10:17:27 UTC - in response to Message 4237.

we are working on improving the Windows speed with Nvidia.
They have just sent some code to test.
g


What about those pesky x86_64bit app memory leaks? I have tried just about everything. The only solution so far is to monitor memory usage and reboot before it gets filled up.
BR,

____________

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4249 - Posted: 11 Dec 2008 | 13:10:44 UTC

I found out about the DCF, this is because the client was changed to use FLOPS counting for GPU tasks, The reason it is off is the FLOPS estimate in the work unit is too low. The old version client works because it does not use that value. GDF will correct in new work units.

I do believe I have had a more steady flow of work in 6.4.5 but only 1 at a time so far as the DCF is too high. Once the correct FLOPS value is used, it should correct back to near 1 (over several task) and resume normal operation.

I did get more work, automatically, after one finished, well 49 minutes after, but at least that is better than having no work and two just waiting to report needing manual intervention as in 6.4.4

12/10/2008 10:00:30 PM|GPUGRID|Finished upload of JZa1465-GPUTEST5-15-20-acemd_0_2
...
12/10/2008 10:49:08 PM|GPUGRID|Sending scheduler request: To fetch work. Requesting 16354 seconds of work, reporting 1 completed tasks
12/10/2008 10:49:13 PM|GPUGRID|Scheduler request completed: got 1 new tasks
...
12/10/2008 10:50:11 PM|GPUGRID|Finished download of no10932-GPUTEST5-14-grama.ionized.psf
12/10/2008 10:50:12 PM|GPUGRID|Starting no10932-GPUTEST5-14-20-acemd_0
12/10/2008 10:50:12 PM|GPUGRID|Starting task no10932-GPUTEST5-14-20-acemd_0 using acemd version 653

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4253 - Posted: 11 Dec 2008 | 16:44:31 UTC - in response to Message 4249.

I should have fixed the estimated flops for new workunits.
This return a correct timing only on 6.4.5 clients.


gdf

Profile The Gas Giant
Avatar
Send message
Joined: 20 Sep 08
Posts: 54
Credit: 607,157
RAC: 0
Level
Gly
Scientific publications
watwatwatwat
Message 4256 - Posted: 11 Dec 2008 | 19:34:22 UTC

Just upgraded to 6.4.5. There is 1 GPU task running and none in the cache. I have work buffer set at 1.0 days and connect every 0.1 days. BOINC use to cache upto 4 wu's, but now get the following message on the work request.

12/12/2008 6:26:57 AM|GPUGRID|Sending scheduler request: To fetch work. Requesting 99540 seconds of work, reporting 0 completed tasks
12/12/2008 6:27:12 AM|GPUGRID|Scheduler request completed: got 0 new tasks
12/12/2008 6:27:12 AM|GPUGRID|Message from server: No work sent
12/12/2008 6:27:12 AM|GPUGRID|Message from server: (won't finish in time) BOINC runs 99.9% of time, computation enabled 100.0% of that

Profile Nightlord
Avatar
Send message
Joined: 22 Jul 08
Posts: 61
Credit: 5,461,041
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 4259 - Posted: 11 Dec 2008 | 21:55:42 UTC

I haven't touch my installations since 6.3.21 but I get the same message too now.
____________

Sherman H.
Send message
Joined: 28 Sep 08
Posts: 27
Credit: 6,201,632,872
RAC: 2
Level
Tyr
Scientific publications
watwatwatwat
Message 4260 - Posted: 12 Dec 2008 | 2:56:44 UTC

I got the same message on 2 machines running 6.3.19.

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 395,402,681
RAC: 1,594,520
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4261 - Posted: 12 Dec 2008 | 3:47:54 UTC - in response to Message 4236.

It appears with a change in the client the DCF is getting maxed out to 100, this started with 6.4.3. What happens is this cause the cleint to think that every GPUGRID task is going to take way longer than it does. The 4 day deadline, to the client is too short, and it runs the task in high priority, not fetching more work either.


I upgraded to 6.4.5 given the note that it was the preferred client. After noticing this issue, I downgraded back to my previous 6.3.19 (which has worked best for me in the past). However, it still has the GPU tasks running High Priority with estimated completion times ~ 10474:34:16. Previously, these were about 11 hours (though not accurate, low by about 4 hours), and thus did not run High Priority.

It sounds as though this will self-correct eventually, though I was hoping it would revert back to previous functionality after doing a fresh uninstall/install of the BOINC 6.3.19 client. :-(

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 3,441,551,487
RAC: 53,438,251
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 4282 - Posted: 13 Dec 2008 | 10:06:46 UTC - in response to Message 4261.

It sounds as though this will self-correct eventually, though I was hoping it would revert back to previous functionality after doing a fresh uninstall/install of the BOINC 6.3.19 client. :-(


I wouldn't count on that at the moment, it seems the more things change the worse things get. I've had 2 Box's with just 1 Wu on them for 2-3 day's now & still am getting crazy To Completion times as high as 27,000+ Hours on those Box's.

I've also noticed a few other Box's have dropped to only 2 or 3 Wu's so I suppose they will be down to just 1 Wu too eventually ... 0_o

JAMC
Send message
Joined: 16 Nov 08
Posts: 28
Credit: 12,688,454
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 4283 - Posted: 13 Dec 2008 | 10:20:01 UTC
Last modified: 13 Dec 2008 | 10:21:47 UTC

I am getting WU's with 282 and 538 hour to completion times- everything running high priority :(
6.4.5, XP Home

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4284 - Posted: 13 Dec 2008 | 10:32:46 UTC

I've changed the DCF-Factor on my machines to a more realistic value and now my boxes (all Quads) are running with a nearly correct estimated time, 8 hours instead of 6:30h. Now I have the next problem. I've running the additional projects CPDN, PrimeGrid, WCG and MilkyWay. To get a new WU I have to stop 3 projects, especially CPDN (work for over 800 hours) and WCG (work for 48 hours). My workcache is set to 2 days. When I have downloaded on this way a second WU, the timer shows 24 hours for the next call. But my GTX280 need only 13 hours for this 2 WUs. So, If I'm absent and can't make a call manually, my PC will be 11 hours without new work. This should be corrected.
____________

JAMC
Send message
Joined: 16 Nov 08
Posts: 28
Credit: 12,688,454
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 4285 - Posted: 13 Dec 2008 | 10:40:13 UTC - in response to Message 4284.

I've changed the DCF-Factor on my machines to a more realistic value and now my boxes (all Quads) are running with a nearly correct estimated time, 8 hours instead of 6:30h. Now I have the next problem. I've running the additional projects CPDN, PrimeGrid, WCG and MilkyWay. To get a new WU I have to stop 3 projects, especially CPDN (work for over 800 hours) and WCG (work for 48 hours). My workcache is set to 2 days. When I have downloaded on this way a second WU, the timer shows 24 hours for the next call. But my GTX280 need only 13 hours for this 2 WUs. So, If I'm absent and can't make a call manually, my PC will be 11 hours without new work. This should be corrected.


I saw this too when the cache was set to 1 day or more so I have just reduced it to .5 days and have not seen the 23 hour plus time for the next connect... I am often left with just the cuda WU being crunched and no others in line and have to manually suspend the other projects to prime the pump for more... 1/5 boxes not running GPU WU in high priority at the moment...

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,387,167,350
RAC: 1,237,696
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4314 - Posted: 14 Dec 2008 | 1:31:59 UTC
Last modified: 14 Dec 2008 | 1:37:05 UTC

Every gpu (9800gtx+) task is running at high priority and there is no need. Each task finishes in under 11 hours iregardless of the priority and the deadlines are at least 4 days away.

On a quad system, this causes one cpu to be dedicated to the gpugrid task. There is no need for this as I was getting 11 hours gpu completion with 5 tasks running and it is no different with 4 tasks running. This has dropped my overall credit production down.

I assume going back to 6.4.1 might fix this???


With 6.4.1, I was using about 800 seconds of CPU time to process an 11 hour ET job. Now it is taking 22,000 seconds to do the same job. There seems no way to disable the high priority for the gputask. They (BOINC) are not calculating the coprocessor efficiency and utilization correctly.

Profile Jack Shaftoe
Send message
Joined: 26 Nov 08
Posts: 27
Credit: 1,813,606
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 4320 - Posted: 14 Dec 2008 | 15:13:09 UTC
Last modified: 14 Dec 2008 | 15:24:38 UTC

Using 6.4.5 last yesterday, 2 blue screens:

Error code 100000ea, parameter1 8855c2c8, parameter2 89966940, parameter3 bacfbcbc, parameter4 00000001.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.


One of my teammates had the same problem on his box. If I roll back to 6.3.x - what was the last recommended version? 6.3.19?

JAMC
Send message
Joined: 16 Nov 08
Posts: 28
Credit: 12,688,454
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 4321 - Posted: 14 Dec 2008 | 15:19:30 UTC

So is 6.4.5 still the suggested version- hope for some quick fixes, or roll back to version 'x'?

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4322 - Posted: 14 Dec 2008 | 15:43:09 UTC - in response to Message 4320.

Using 6.4.5 last yesterday, 2 blue screens:

Error code 100000ea, parameter1 8855c2c8, parameter2 89966940, parameter3 bacfbcbc, parameter4 00000001.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.


One of my teammates had the same problem on his box. If I roll back to 6.3.x - what was the last recommended version? 6.3.19?


Which driver you are using? Try the newest driver from here:

CUDA-Driver


____________

Profile Jack Shaftoe
Send message
Joined: 26 Nov 08
Posts: 27
Credit: 1,813,606
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 4323 - Posted: 14 Dec 2008 | 15:47:12 UTC - in response to Message 4322.
Last modified: 14 Dec 2008 | 16:09:39 UTC

178.28

Should I be using 180.60? I didn't think we were on CUDA 2.1 yet.

EDIT: Just put 6.3.19 on. Hope it goes back to being stable.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4328 - Posted: 14 Dec 2008 | 16:52:36 UTC - in response to Message 4314.
Last modified: 14 Dec 2008 | 17:04:34 UTC

On a quad system, this causes one cpu to be dedicated to the gpugrid task. There is no need for this as I was getting 11 hours gpu completion with 5 tasks running and it is no different with 4 tasks running. This has dropped my overall credit production down.

I assume going back to 6.4.1 might fix this???


With 6.4.1, I was using about 800 seconds of CPU time to process an 11 hour ET job. Now it is taking 22,000 seconds to do the same job. There seems no way to disable the high priority for the gputask. They (BOINC) are not calculating the coprocessor efficiency and utilization correctly.

This is what I'm seeing after "upgrading" from 6.4.1 to 6.4.5. Now my quad will run only 4 tasks instead of 5. I'm going to try going back to 6.4.1.

Edit: Just moved back to 6.4.1 and the quad has returned to running 5 tasks. I'd avoid the new 6.4.5.

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4330 - Posted: 14 Dec 2008 | 17:17:50 UTC - in response to Message 4328.

My 8800GT and my GTX260² are with 4+1 as fast like 3+1, only my GTX280 need the 3+1 mode, otherwise I lost a lot of time for crunching. The cards are very different in the used power of the CPU.
____________

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4331 - Posted: 14 Dec 2008 | 17:20:05 UTC - in response to Message 4323.

178.28

Should I be using 180.60? I didn't think we were on CUDA 2.1 yet.

EDIT: Just put 6.3.19 on. Hope it goes back to being stable.
#

Should be OK, I still use the 178.24 WHQL.

____________

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,387,167,350
RAC: 1,237,696
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4349 - Posted: 14 Dec 2008 | 22:23:41 UTC - in response to Message 4328.

On a quad system, this causes one cpu to be dedicated to the gpugrid task. There is no need for this as I was getting 11 hours gpu completion with 5 tasks running and it is no different with 4 tasks running. This has dropped my overall credit production down.

I assume going back to 6.4.1 might fix this???


With 6.4.1, I was using about 800 seconds of CPU time to process an 11 hour ET job. Now it is taking 22,000 seconds to do the same job. There seems no way to disable the high priority for the gputask. They (BOINC) are not calculating the coprocessor efficiency and utilization correctly.

This is what I'm seeing after "upgrading" from 6.4.1 to 6.4.5. Now my quad will run only 4 tasks instead of 5. I'm going to try going back to 6.4.1.

Edit: Just moved back to 6.4.1 and the quad has returned to running 5 tasks. I'd avoid the new 6.4.5.


Didnt work for me - 6.4.1 picked up with the same 4 tasks and high priority for gpugrid. I suspect there is some adaptive algorithm, learning and/or a resource file that needs to be undone. If you are back at 5 tasks I suspect that very shortly they may go to high priority on you.

Did the 5 tasks come up immediately or did you to wait for a current gpugrid job to get done? When I saw there was no change I re-installed 6.4.5. At least 6.4.5 got the 11 hours correct at 6.4.1 was showing 90 days to complete.

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 3,441,551,487
RAC: 53,438,251
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 4359 - Posted: 15 Dec 2008 | 16:22:00 UTC

I switched all my Box's back to 6.3.21, the 6.4.5 client was just to messed up for me. I couldn't keep a consistent amount of Wu's running, I prefer just 3 & 1 but depending on which Projects were running the amount would go from 3 & 1 to 4 & 1 to 2 & 1 & back to 3 & 1 again.

With the v6.3.21 I don't have any problems holding 3 & 1 like I Prefer to run no matter what Projects were running.

Jayargh
Send message
Joined: 21 Dec 07
Posts: 47
Credit: 5,252,135
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 4361 - Posted: 15 Dec 2008 | 17:03:48 UTC - in response to Message 4359.

I switched all my Box's back to 6.3.21, the 6.4.5 client was just to messed up for me. I couldn't keep a consistent amount of Wu's running, I prefer just 3 & 1 but depending on which Projects were running the amount would go from 3 & 1 to 4 & 1 to 2 & 1 & back to 3 & 1 again.

With the v6.3.21 I don't have any problems holding 3 & 1 like I Prefer to run no matter what Projects were running.


Funny thing is the Linux flavor of 6.4.5 runs 4+1 consistently with no problems,as I wish,since Linux uses only about 2% of the cpu.

The dcf problems don't seem to be caused by the Boinc client as reverting back to 6.3.21 still caused dcf's to be way out of whack showing 100's of hrs to completion. All BOINC clients are doing this now that are capable of running GPU's that I have so I point my finger at the project and not the client.

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 4362 - Posted: 15 Dec 2008 | 18:10:07 UTC - in response to Message 4361.

The dcf problems don't seem to be caused by the Boinc client as reverting back to 6.3.21 still caused dcf's to be way out of whack showing 100's of hrs to completion


I never switched from 6.3.21, and my DCF has remained nicely just above 1. Thus, the newer clients have to be part of the equation. Did you reset the project after reverting to the older client?

Jayargh
Send message
Joined: 21 Dec 07
Posts: 47
Credit: 5,252,135
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 4363 - Posted: 15 Dec 2008 | 18:34:02 UTC - in response to Message 4362.
Last modified: 15 Dec 2008 | 18:43:13 UTC

The dcf problems don't seem to be caused by the Boinc client as reverting back to 6.3.21 still caused dcf's to be way out of whack showing 100's of hrs to completion


I never switched from 6.3.21, and my DCF has remained nicely just above 1. Thus, the newer clients have to be part of the equation. Did you reset the project after reverting to the older client?



No Scott I did not as I had work in queue and in progress however I did manually change the dcf values before switching and still got new work with way off estimation times....I switched from 6.3.21 to 6.4.5 because all of a sudden the client started running in high priority when GDF made the server side changes and I couldn't get new work....but you may be right ;)

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 4365 - Posted: 15 Dec 2008 | 18:56:35 UTC - in response to Message 4363.

I think the new clients were the culprit in jumping the DCF, but I do have the high priority issue for the GPU app even with 6.3.21 never being upgraded. I am guessing that I never had the issue of running low on work as many reported since my calc times (around 20 hours on a 9600 GSO) were fairly similar to the 24-hour thing that GDF changed on the project server (noted in another thread...can't find the post just now).

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4366 - Posted: 15 Dec 2008 | 20:06:12 UTC - in response to Message 4365.

What we do not understand is that a project reset should solve the problem.

gdf

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 3,441,551,487
RAC: 53,438,251
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 4367 - Posted: 15 Dec 2008 | 20:44:18 UTC - in response to Message 4366.
Last modified: 15 Dec 2008 | 20:44:43 UTC

What we do not understand is that a project reset should solve the problem.

gdf


I did that & it doesn't solve the Problem, the To Completion Times were still messed up ...

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4368 - Posted: 15 Dec 2008 | 20:53:00 UTC - in response to Message 4366.

What we do not understand is that a project reset should solve the problem.

gdf


Reset didn't helped me, I checked this on any PC without a new WU. I edit the DCF to one and then I get new work. If the DFC messed up again, I edit it again and get new work. When I get another Wu then one of the GPUTEST5 or GPUTEST6, I know, my DCF will be messed up again.

I think, the problem is not the application, the problem is in some WUs. I have Task 165605 and 165269 in the queue. The 165605 shows a time of 1:41:25 and the 165269 a time of 2:26. Both actual with the same DCF of 1.217038. The problem must be in the estimated time of some WUs. The real running time is not so different as the estimated time. Some are running 7 1/2 hour, others only 5 3/4 hour. That's don't reflect the big difference in the estimated time between this both WUs I have at the moment in the queue.

____________

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4370 - Posted: 15 Dec 2008 | 21:24:10 UTC - in response to Message 4368.

These workunits have different requests of flops in an attempt to reduce the estimated time.
The real problem is not for short estimated times as yours but long times as the client refuses the wu.


gdf

Profile UBT - Ben
Send message
Joined: 12 Aug 08
Posts: 8
Credit: 137,219
RAC: 0
Level

Scientific publications
watwatwat
Message 4371 - Posted: 15 Dec 2008 | 21:42:37 UTC - in response to Message 4246.

we are working on improving the Windows speed with Nvidia.
They have just sent some code to test.
g


What about those pesky x86_64bit app memory leaks? I have tried just about everything. The only solution so far is to monitor memory usage and reboot before it gets filled up.
BR,



You could try using a scheduler to automatically restart windows every so many hours / days. That may help, but you would need to install BOINC as a service, or set your PC to login automatically which is the slight downside to the plan.
____________

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4372 - Posted: 15 Dec 2008 | 22:25:30 UTC - in response to Message 4371.

We will distribute a WIN64 application. Working on it.
gdf

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4374 - Posted: 15 Dec 2008 | 23:23:14 UTC - in response to Message 4370.

These workunits have different requests of flops in an attempt to reduce the estimated time.
The real problem is not for short estimated times as yours but long times as the client refuses the wu.


gdf


ACK, but after the run of this WU with a short estimated time the DCF jumps on 100 and I can't get a new WU without manipulate the DCF manually.

____________

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4375 - Posted: 15 Dec 2008 | 23:38:04 UTC - in response to Message 4374.

They just said that there is a bug on the server code.

We will try it tomorrow.

gdf

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4376 - Posted: 16 Dec 2008 | 0:42:47 UTC - in response to Message 4375.

They just said that there is a bug on the server code.

We will try it tomorrow.

gdf

This bug:
- scheduler: estimate job durations based on the FLOPS estimate

for the selected APP_VERSION, rather than on the CPU benchmarks.
Otherwise estimates are wrong for GPU or multi-thread apps.

Profile Jack Shaftoe
Send message
Joined: 26 Nov 08
Posts: 27
Credit: 1,813,606
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 4387 - Posted: 16 Dec 2008 | 13:20:10 UTC - in response to Message 4320.
Last modified: 16 Dec 2008 | 13:41:51 UTC

Using 6.4.5 last yesterday, 2 blue screens:

Error code 100000ea, parameter1 8855c2c8, parameter2 89966940, parameter3 bacfbcbc, parameter4 00000001.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.


One of my teammates had the same problem on his box. If I roll back to 6.3.x - what was the last recommended version? 6.3.19?


After 6.3.19 rollback, everything is great and stable (except GPUGrid won't pick up new work when high-priority tasks are running on my CPU, but that's a different issue.)

Stay away from 6.4.5.

Milford
Send message
Joined: 17 Jul 07
Posts: 14
Credit: 9,618,510
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwatwat
Message 4388 - Posted: 16 Dec 2008 | 13:25:51 UTC

I also rollback to 6.3.21; almost everything is fine now.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4407 - Posted: 17 Dec 2008 | 4:22:09 UTC - in response to Message 4349.

On a quad system, this causes one cpu to be dedicated to the gpugrid task. There is no need for this as I was getting 11 hours gpu completion with 5 tasks running and it is no different with 4 tasks running. This has dropped my overall credit production down.

I assume going back to 6.4.1 might fix this???


With 6.4.1, I was using about 800 seconds of CPU time to process an 11 hour ET job. Now it is taking 22,000 seconds to do the same job. There seems no way to disable the high priority for the gputask. They (BOINC) are not calculating the coprocessor efficiency and utilization correctly.

This is what I'm seeing after "upgrading" from 6.4.1 to 6.4.5. Now my quad will run only 4 tasks instead of 5. I'm going to try going back to 6.4.1.

Edit: Just moved back to 6.4.1 and the quad has returned to running 5 tasks. I'd avoid the new 6.4.5.


Didnt work for me - 6.4.1 picked up with the same 4 tasks and high priority for gpugrid. I suspect there is some adaptive algorithm, learning and/or a resource file that needs to be undone. If you are back at 5 tasks I suspect that very shortly they may go to high priority on you.

Did the 5 tasks come up immediately or did you to wait for a current gpugrid job to get done? When I saw there was no change I re-installed 6.4.5. At least 6.4.5 got the 11 hours correct at 6.4.1 was showing 90 days to complete.

The 5 tasks started right away. After using 6.4.5 my estimated times are way off though. They were pretty close before. Something changed by installing 6.4.5.

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4408 - Posted: 17 Dec 2008 | 9:01:06 UTC

How could it be, that the 6.4.5 is be the "Recommended version" here and on the Berkeley-Server? All is running fine with the 6.3.21, with the 6.4.2 and 6.4.3 is a only a little problem the DCF (HighPrioMode), but with the 6.4.5 you can't run GPUGrid without babysitting the boxes all the time.

Is the failure only on the server of GPUGrid and come with the calculated estimted time with the WU on our boxes? Then please stop this WUs with the wrong calculated time.
____________

The Grinch
Send message
Joined: 11 Dec 08
Posts: 1
Credit: 78,451
RAC: 0
Level

Scientific publications
wat
Message 4453 - Posted: 18 Dec 2008 | 8:53:57 UTC

On my Windows x64 and BOINC-Manager x64 i got the Message:
18.12.2008 09:50:19|GPUGRID|Message from server: No work sent
18.12.2008 09:50:19|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
18.12.2008 09:50:19|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.
18.12.2008 09:50:19|GPUGRID|Message from server: Full-atom molecular dynamics is not available for your type of computer.

Is there no Client for?

Profile Wassertropfen
Avatar
Send message
Joined: 14 Aug 08
Posts: 15
Credit: 13,774,919
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4455 - Posted: 18 Dec 2008 | 9:12:23 UTC - in response to Message 4408.

with the 6.4.2 and 6.4.3 is a only a little problem the DCF (HighPrioMode), but with the 6.4.5 you can't run GPUGrid without babysitting the boxes all the time.

The 6.4.3 works fine. No Problem and 2 new WU last night. Even when a normal task run in highpriomode. :)
____________
Constant dripping wears away the stone. :)

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4501 - Posted: 18 Dec 2008 | 18:27:30 UTC
Last modified: 18 Dec 2008 | 18:40:10 UTC

Well, I just built a new computer with a Nvidia 9800 GT with 1 G VRAM and got three tasks right off the bat. I could not log into the web site so I went with NNW until that was resolved.

I checked this morning and Voila, my account allowed me to log on (Praise something)...

But, when I tried to fetch work... I got the message that there was no work for my Cell processor type (Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.)

I did a project reset and it looks like it is downloading a task ...

And an immediate failure with computation error.

So, two successes and two computation errors. Is this common?

I am running video driver 6.14.11.8048 and BOINC 6.4.5

Hmmm, it is downloading another task ...

Computation error ...

Another computation error ...

{edit}Down-leveled BOINC to 6.2.19 but have the 0 seconds asked for ... so will have to wait to see if I get more work later{/edit}

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4502 - Posted: 18 Dec 2008 | 18:30:36 UTC - in response to Message 4501.

As far as I see you have received it.
uW10579-SH2_US-3-40-SH2_US200000

Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 4524 - Posted: 19 Dec 2008 | 2:38:58 UTC
Last modified: 19 Dec 2008 | 3:04:26 UTC

Just upgraded to 6.4.5 and did a reset. Got two WU for my 2 280's and when it tried to download more got this:

12/18/2008 9:32:47 PM|GPUGRID|Sending scheduler request: Requested by user. Requesting 518336 seconds of work, reporting 0 completed tasks
12/18/2008 9:32:52 PM|GPUGRID|Scheduler request completed: got 0 new tasks
12/18/2008 9:32:52 PM|GPUGRID|Message from server: No work sent
12/18/2008 9:32:52 PM|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.
12/18/2008 9:32:52 PM|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.

My estimated completion time started low and is going up rather that starting high and going down. However they are not running in high priority and show only 0.03 CPU's. That would be a huge improvement over 0.9 CPU's in 6.3.21.

Now all I need to do is sleep infront of my rig with the alarm set every 6 hrs to download new WU.

Glad tomorrow is the last day of work for 2 wks.

Pat

Update: Changed the number of days of work to 4 and was able to download two more WU. We'll see in the morning if I don't have to babysit.

pelpolaris
Send message
Joined: 10 Nov 08
Posts: 8
Credit: 876,616,559
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4694 - Posted: 21 Dec 2008 | 22:16:29 UTC

No more "babysitting" with 6.4.5 ;-)

I run one GPU on Vista-32 and another on Vista-64 with 6.4.5 BOINC. The all on to similar hardw. After some days of troubles and newbee try-&-fail, I may report that I solved my issues on the 64-system by exclusively run others Boinc 64-applications at the same time with the GPU-apps. Last day I tried to extend again the range of apps running on the 64-sys with some 32-apps and it didn't took many hours before I got a sys.restart to handle with.
Solution was to detach again all apps that do not run on 64 bits arch. and to synchronize right after the restart. I didn't inquire which one of those 32-apps that caused the mess. & I do not intend to do so either... Merry Christmas !

pelpolaris
Send message
Joined: 10 Nov 08
Posts: 8
Credit: 876,616,559
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4695 - Posted: 21 Dec 2008 | 22:21:06 UTC - in response to Message 4694.

I ment CUDA-apps on GPU... Not only GPU!

Rabinovitch
Avatar
Send message
Joined: 25 Aug 08
Posts: 143
Credit: 64,937,578
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 4918 - Posted: 27 Dec 2008 | 2:42:41 UTC

It doesn't even think to get new WUs for my GPU. But even if I force it to get WUs, there are running only 2 tasks: gpugrid and CPU-task. What's wrong?

It seems that all of us using different boic managers 6.4.5.....

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4922 - Posted: 27 Dec 2008 | 6:48:38 UTC - in response to Message 4918.

It doesn't even think to get new WUs for my GPU. But even if I force it to get WUs, there are running only 2 tasks: gpugrid and CPU-task. What's wrong?

It seems that all of us using different boic managers 6.4.5.....

Well, I am running 6.5.0 ...

As far a the problem mentioned by Dr. Anderson which has as a symptom this issue (won't fetch work) affects both versions 6.4.5 and 6.5.0 ... we can brute force it to make it go. I am using a higher share and larger queue and that has been working for me to this point. I just got another task automagically when I turned in the last one processed. So, I have one in flight and one pending on the i7 machine.

On the slower machine I have yet to run off the first task and have a pending ... but it looks like that card is so slow that it will take nearly two days to run one task for this project. (8800 class card I think, noper, 8500 GT and 40 hours to get to 41% done ... I may just run these two tasks and not waste my time that is way too slow to run the risk of blowing the task).

Rabinovitch
Avatar
Send message
Joined: 25 Aug 08
Posts: 143
Credit: 64,937,578
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 4994 - Posted: 28 Dec 2008 | 12:42:05 UTC

Using 6.5.0, runnig 3 tasks at last. acemd takes 10 to 15% CPU time.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5014 - Posted: 28 Dec 2008 | 15:47:02 UTC - in response to Message 4994.

Using 6.5.0, runnig 3 tasks at last. acemd takes 10 to 15% CPU time.


It was much lower with the dot 56 version of the Science Application. Which was pulled for some other buglet ... well, after the new year maybe we will see a revised version that will drop the CPU usage.

I am not sure, but it looks to me like I have a 33 some hour task which is twice as long as normal ... my par is about 17 hours (historically) on the 9800 GT card.

I had a task on the 8800 and I am not sure I am going to make the deadline on it on the 29th ... the second task I had on that machine I had to kill as there was no way I was going to get the first one done and then the second by the deadline ...

Post to thread

Message boards : Graphics cards (GPUs) : BOINC 6.4.5 released for Windows, Windows x64, Linux and Linux x64

//