Author |
Message |
|
Just as bad as tasks that die are tasks that are not available ...
Very depressing ...
Out of work ... |
|
|
naja002 Send message
Joined: 25 Sep 08 Posts: 111 Credit: 10,352,599 RAC: 0 Level
Scientific publications
|
Went through that earlier today on a 2x gpu rig. Struggled with update for a bit and finally got some more work. Right now update isn't being cooperative and no WUs in cache on that rig......
The other 3 seem to be doing ok work-wise...
All are running 6.6.28 |
|
|
naja002 Send message
Joined: 25 Sep 08 Posts: 111 Credit: 10,352,599 RAC: 0 Level
Scientific publications
|
Now a different 1x gpu rig is out of work.
No work sent
Blah, blah, blah not available for your type of computer
I checked the long-term debt and it's zero.....
|
|
|
|
Well, at the time I posted the opener there was literally no work in the queue. Now there is 157, get 'em while they are hot ... |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
None of my machines are able to DL work today. The message log reads:
GPUGRID 5/18/2009 1:31:06 PM Sending scheduler request: To fetch work.
GPUGRID 5/18/2009 1:31:06 PM Requesting new tasks
GPUGRID 5/18/2009 1:31:12 PM Scheduler request completed: got 0 new tasks
GPUGRID 5/18/2009 1:31:12 PM Message from server: No work sent
GPUGRID 5/18/2009 1:31:12 PM Message from server: CUDA app exists for Full-atom molecular dynamics but no CUDA work requested
GPUGRID 5/18/2009 1:31:12 PM Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
Using BOINC v6.6.24 & v6.6.28, 2 9600 GSO cards and 1 GTX 260. |
|
|
|
None of my machines are able to DL work today. The message log reads:
GPUGRID 5/18/2009 1:31:06 PM Sending scheduler request: To fetch work.
GPUGRID 5/18/2009 1:31:06 PM Requesting new tasks
GPUGRID 5/18/2009 1:31:12 PM Scheduler request completed: got 0 new tasks
GPUGRID 5/18/2009 1:31:12 PM Message from server: No work sent
GPUGRID 5/18/2009 1:31:12 PM Message from server: CUDA app exists for Full-atom molecular dynamics but no CUDA work requested
GPUGRID 5/18/2009 1:31:12 PM Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
Using BOINC v6.6.24 & v6.6.28, 2 9600 GSO cards and 1 GTX 260.
YOu got hit with the work fetch bug. Two choices, reset the project or reset debts.
Resetting the project may or may not be enough to pull you out. THe problem is that the LTD are out of whack and you are not asking for CUDA work from GPU Grid. |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
YOu got hit with the work fetch bug. Two choices, reset the project or reset debts.
Thanks Paul! How do I reset debts in v6.6.28?
Edit: I stopped the client and saw that the CUDA_debt in client_state.xml was at a huge value (long tern debt was already at 0). Set CUDA_debt to 0.000000 and the client immediately requested & received Wus upon restart. Fixed for now, but v6.6.28 needs work as you've been saying.
Thanks again! |
|
|
|
You got hit with the work fetch bug. Two choices, reset the project or reset debts.
3rd choice: use 6.5.0 ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
You got hit with the work fetch bug. Two choices, reset the project or reset debts.
3rd choice: use 6.5.0 ;)
Sheesh, it has it's own problems. I much prefer v6.6.23-v6.6.28.
|
|
|
|
YOu got hit with the work fetch bug. Two choices, reset the project or reset debts.
Thanks Paul! How do I reset debts in v6.6.28?
Edit: I stopped the client and saw that the CUDA_debt in client_state.xml was at a huge value (long tern debt was already at 0). Set CUDA_debt to 0.000000 and the client immediately requested & received Wus upon restart. Fixed for now, but v6.6.28 needs work as you've been saying.
Thanks again!
Use the flag in the cc_config file. Stop the attached client, exit BOINC Manager, restart with the flag set to "1" (One) then change the flag back to 0 (zero) ... you cannot jsut read the config file and have it work. (Sadly)
<cc_config>
<log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
<zero_debts>1</zero_debts>
</options>
</cc_config>
You may want to, for safety sake leave the use all GPU flag set too ... just incase you add GPUs ...
With all KNOWN versions of 6.6.x you are going to get hit with the LTD bug, people tell me they are not affected ... until they are ... The faster and wider the system the faster the bug hits. I generally see it on my i7 with the 2 GTX295 cards about every 2-3 days ...
My NEW systems I am not sure what is going on there, but on one of them I cannot get it to queue work to save my life ... and it is nearly identical to the other I just built and that one has work queued ... sigh ...
Not that it has been doing any good, but I have been documenting a number of issues on the dev/alpha boards; some of them are long standing bugs that have kinda been hiding and though I had my suspicions for a long time it was not until last week that I was able to prove that one project's application ***CAN*** cause other project's tasks to fail ... a problem I predicted back in BOINC Beta, I was also told that it would never happen ... in a way I hate being right all the time .... :)
Ok, one side effect is that using the debt reset does fardle up all the shares so your long running balances will be probably out of kilter ... of course I have demonstrated to my satisfaction that the work fetch is hosed too ... but, cannot prove it with numbers yet. SOrry, guys, I am still looking at some death type issues and am hoping I may be able to get a lead on the long running task issue ... which was real bad in 6.6.20 but I am not sure it is fully gone ...
Rats, got wordy again ... |
|
|
|
"you cannot jsut read the config file and have it work. (Sadly)"
Yes you can, fortunately, in the 6.4.x version. Because in that version the tag zero_debts does not work. I ran across this issue tonight and manually editing the client_state.xml solved it (for now). Thanks for that suggestion !
____________
Join team Bletchley Park, the innovators. |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
More work uploaded.
gdf |
|
|
AndrewSend message
Joined: 9 Dec 08 Posts: 29 Credit: 18,754,468 RAC: 0 Level
Scientific publications
|
What do the debts actually mean?
In my client_state config file I have:
<short_term_debt>0.000000</short_term_debt>
<long_term_debt>-177104.643607</long_term_debt>
<cpu_backoff_interval>0.000000</cpu_backoff_interval>
<cpu_backoff_time>0.000000</cpu_backoff_time>
<cuda_debt>-54.294054</cuda_debt>
<cuda_backoff_interval>86400.000000</cuda_backoff_interval>
<cuda_backoff_time>1244256480.527597</cuda_backoff_time> |
|
|
|
What do the debts actually mean?
In my client_state config file I have:
<short_term_debt>0.000000</short_term_debt>
<long_term_debt>-177104.643607</long_term_debt>
<cpu_backoff_interval>0.000000</cpu_backoff_interval>
<cpu_backoff_time>0.000000</cpu_backoff_time>
<cuda_debt>-54.294054</cuda_debt>
<cuda_backoff_interval>86400.000000</cuda_backoff_interval>
<cuda_backoff_time>1244256480.527597</cuda_backoff_time>
What they tell me is that your numbers are messed up ...
The last one is way out of line... do you have any work at all? |
|
|
AndrewSend message
Joined: 9 Dec 08 Posts: 29 Credit: 18,754,468 RAC: 0 Level
Scientific publications
|
Hi Paul,
Yes, my GPU is always occupied (I have 2 CPU cores, 1 GPU), but the reason I posted was that at the time I read the thread, I only had work from GPUGrid, climateprediction and boincsimap, and hadn't seen any from rosetta, docking, spinhenge, uFluids for a while, despite their web pages reporting work. Now the former had higher resource share, but something wasn't right.
I now have work from all active projects... I notice that my long-term debt has gone down significantly, I now have short-term debt and the cuda debt has gone to 0.
<short_term_debt>-5148.575270</short_term_debt>
<long_term_debt>-27925.193119</long_term_debt>
<cpu_backoff_interval>0.000000</cpu_backoff_interval>
<cpu_backoff_time>0.000000</cpu_backoff_time>
<cuda_debt>0.000000</cuda_debt>
<cuda_backoff_interval>61440.000000</cuda_backoff_interval>
<cuda_backoff_time>1244409908.184854</cuda_backoff_time>
Do you know what these numbers actually mean Paul? And should I manually correct?
Thanks for your help ;)
p.s. you're right about the huge numbers; the CUDA backoff time is about a fortnight! |
|
|
|
If you are not fetching work I would stop BOINC and the client (use the advanced menu and make sure the science appliations are stopped), then change the back-off time to 0 ... the rest, for the moment I would not fiddle with if you don't have too ...
If you still have problems, then set all to zero ...
But, again, only do this if you have troubles getting work ...
For the information on the exact meaning, well the best I can do is this in that the powers that be change things without much thought of what it means and how it impacts the universe ... sadly, this page is out of date but I cannot tell you for sure what exactly is wrong or missing ...
Short term debt is meant to decide what to run next ... LTD, what to fetch next.
Sadly, the assumption is that letting the system freewheel is being tested and to this point the developers are resisting all notions that this might be a "bad thing" ... specifically, allowing GPU Grid to accumulate CPU values when there is no CPU work and not likely to be any for some time in the future ... just as if you look at Rosetta (if attached) you are likely to see it runnning up numbers for the GPU side ...
Personally, I think that is a bad design decision that is only going to get worse as we add computing types to the current three NCI, CPU, GPU (Nvida) ... where we already have definition problems and work fetch problems caused by the current non-design...
To see this, attach to FreeHAL as an NCI and it is likley that you will see times where you will not fetch work because the class is not separated from CPU resource shares ... a bad design error ... |
|
|
b1Send message
Joined: 17 Jan 09 Posts: 1 Credit: 0 RAC: 0 Level
Scientific publications
|
Hello alltogether
Recently I upgraded my Computer with a 9600GT and wanted to join GPUGRID. Unfortunatley I can`t fetch any work units.I`m getting not the exact same error, as described above, however I gave your solutions a try, but without success.
Until now I have tested it with the 6.6.31 and the 6.4.5 version of BOINC.
Im am runnig BOINC on Ubuntu Linux 9.04 with the Nvidia Driver 185.18.08.
Starting Boinc(Version 6.4.5) I get this Output:
|Starting BOINC client version 6.4.5 for i686-pc-linux-gnu
|CUDA devices found
|Coprocessor: GeForce 9600 GT (1)
|Sending scheduler request: To fetch work. Requesting 86400 seconds of work, reporting 0 completed tasks
|Scheduler request completed: got 0 new tasks
|Message from server: No work sent
|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
|Message from server: Full-atom molecular dynamics is not available for your type of computer.
I have no idea what might be wrong. Or are there simply no WUs available at the moment.
Any help is greatly appreciated |
|
|
|
Hello alltogether...
|Starting BOINC client version 6.4.5 for i686-pc-linux-gnu
...
I have no idea what might be wrong. Or are there simply no WUs available at the moment.
Any help is greatly appreciated
You're using the wrong architecture. There's no 32 bit GPUGRID Linux app. Only a 64 bit one...
____________
pixelicious.at - my little photoblog |
|
|
AndrewSend message
Joined: 9 Dec 08 Posts: 29 Credit: 18,754,468 RAC: 0 Level
Scientific publications
|
Cheers for that link Paul.
Well it seems to be working fine now so I'll just leave it, as should be! |
|
|