Message boards : Graphics cards (GPUs) : no cuda work requested
Author | Message |
---|---|
I've just installed boinc 6620 on WinXP, whenever boinc asks for new work the server returns these three messages: | |
ID: 8686 | Rating: 0 | rate: / Reply Quote | |
Your driver may be too old or, more likely, your GPU is not supported. Can't say if your computers are hidden, though. | |
ID: 8694 | Rating: 0 | rate: / Reply Quote | |
I. Am. Not. An. Idiot. | |
ID: 8696 | Rating: 0 | rate: / Reply Quote | |
I. Am. Not. An. Idiot. No one said you were. But when you hide your computers some of the questions ETA would have answered with a peek there. In that these are the most common problems, they are also the most offered solutions. There are issues with the 6.6.20 and 6.6.23 versions that affect some and not others. Next suggestion is to do a project reset on GPU Grid. If that does not work, reset all debt ... | |
ID: 8699 | Rating: 0 | rate: / Reply Quote | |
I. Am. Not. An. Idiot. Well then, sorry.. but from your post there was no way to tell this. Do you still ahve some WUs running or are you dry? MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 8700 | Rating: 0 | rate: / Reply Quote | |
I. Am. Not. An. Idiot. Of course you are not, since you are capable of asking a perfectly acceptable question. But please also accept that ETA is one of the most respected persons, as anyone else here is respected by definition untill prov otherwise; he and all others are just trying to help within means and capabilities, no offence intended. His first reaction is also pretty standard for the ones that followed this forum, since the possible reasons for failure he mentioned are pretty commun even for "not idiots". So if you really want serious help, please describe your system and problem in more detail. Unhiding your computers will help a lot in this since it will allow to see the results of the failing WUs including any error messages. Hope we will be able to help you to help science and humanity. kind regards. Alain | |
ID: 8704 | Rating: 0 | rate: / Reply Quote | |
The afflicted PC will run out of WUs around 4 AM. The message tab is filled with red. | |
ID: 8705 | Rating: 0 | rate: / Reply Quote | |
I had the same message a few hours ago on my I7 with 3 gtx cards, I just played a game, rebooted, and then it downloaded some work??? Not sure cuz I have ps3's and gpu's running. The message came up on my i7 then it fixed itself. Maybe the reboot had something to do with it. | |
ID: 8707 | Rating: 0 | rate: / Reply Quote | |
GPU Results ready to send 0 | |
ID: 8712 | Rating: 0 | rate: / Reply Quote | |
*Sigh* The message tab shows an 'ask & refusal' every hour or so, but when this machine had completely run out of work and I manually updated it was given a single new WU amid all the refusals. | |
ID: 8723 | Rating: 0 | rate: / Reply Quote | |
Apology to ETA: Sorry for my abruptness. You're welcome :) Let's try to solve your problem then. Today I remember that I also got the message "no cuda work requested" when I tried 6.6.20. I quickly reverted to 6.5.0 and the box is running fine since then. You could also try 6.6.23, which supposedly fixed some of the issue of 6.6.20. Until now I didn't see anyone else posting such behaviour of 6.6.20, so it seems to be a rare case. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 8739 | Rating: 0 | rate: / Reply Quote | |
Apology to ETA: Sorry for my abruptness. Um, no... I am becoming less and less convinced that it is isolated. Sorry ... I thought I was being clear. It looks like both 6.6.20 and 6.6.23 have a problem with debt accumulating in one direction and never being properly updated. The eventual result on GPU Grid is that you get fewer and fewer tasks in pending till you start to run dry. Version 6.6.20 had some other problems with suspending tasks and something else that could really mess things up and that I think was the source of the tasks that took exceptionally long times to run. 6.6.23 seems to have fixed that. This 6.6.20 problem may mostly affect people running multiple GPU setups. But, that means that 6.6.20 and 6.6.23 are not; in my opinion, ready for prime time. I *DO* like the new time accounting so that you can more accurately see what is happening with the GPU Grid tasks so for the moment I am personally sticking with 6.6.20 on one system and 6.6.23 on my main but that is also because I am trying to call attention to these issues and the only way to collect the logs is to run the application. Sadly, as usual, the developers don't seem to be that responsive to feedback ... To put it another way, they are very good at ignoring answers to questions they don't want asked. {edit} FOr those having any kind of problem with 6.6.x, try 6.5.0 and if the problem goes away, stay there. Sadly I will stay on the point and will be sending reports from the front as I get them. Failing that, you can always ask directly ... | |
ID: 8758 | Rating: 0 | rate: / Reply Quote | |
No work sent I understand this message in the way that BOINC does request work from GPU-Grid, but it does not request CUDA work (which wo9uld be extremely strange / stupid) and hence the server is not sending CUDA work. Am I totally wrong here? MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 8797 | Rating: 0 | rate: / Reply Quote | |
I also understand it that way... | |
ID: 8799 | Rating: 0 | rate: / Reply Quote | |
I think this is normal if chase 0. | |
ID: 8801 | Rating: 0 | rate: / Reply Quote | |
No work sent No, but the BOINC client is. We may be chasing two bugs here. I am seeing unconstrained growth of GPU debt which essentially causes BOINC to stop asking for work from GPU Grid (another guy on Rosetta has it stopping asking for work from Rosetta so it is not simply a GPU side issue) ... Richard Haselgrove has been demonstrating that the client may be dry of work for GPU but insists on asking for CPU work, the inverse of what it is supposed to be doing. I am running 6.6.23 where the problem seems to be more acute than 6.6.20, which I am running on my Q9300 where I don't seem to be seeing the same issue, yet. Sorry ETA I am not doing well and the brain is slightly mushy so I may not be as clear as usual... I keep thinking I have explained this ... I am going to PM you my e-mail address so you can send me wake-up idiot calls (we can also Skype if you like ... hey that rhymes) ... My bigger point is that AT THE MOMENT ... I cannot recommend either 6.6.20 or 6.6.23 wholeheartedly. 6.6.20 I am pretty sure has a bug that really causes issues on multi-GPU systems and may cause improper suspensions and long running tasks (though it does not seem to be doing that on the Q9300 at the moment (single GPU Though). 6.6.23 has fixes for a couple GPU things but seems to have a broken debt issue (which MAY also exist in 6.6.20, just that the bug fix for one thing exposed the bug ... or the bug fix is buggy ... or the bug fix broke something else ... you get the idea ... Which is why I suggest if anyone is having work fetch issues, fall back to 6.5.0 and if they go 'way, then stay ... or get used to resetting the debts every day or so ... (which causes other problems) ... | |
ID: 8803 | Rating: 0 | rate: / Reply Quote | |
Have you read this about GPU Work Fetch in 6.6.* aslo GpuSched from 6.3.* On the face of it, it looks to me that this design harms those that dedicate 100% effort to an individual project as the LTD will eventually become too little. If something has happened between 6.6.20 and 6.6.23 its probably worth looking at the changesets from 17770 to 17812. I didn't see anything that struck me as obvious. An earlier commit 17544 looks potentially interesting, which came out in 6.6.14. Rob | |
ID: 8809 | Rating: 0 | rate: / Reply Quote | |
One of my quads running XP Home, 6.5.0, (2) GTX 260's has stopped requesting new work and also spitting these messages... manual updates with other projects suspended still requests 0 new tasks... this rig has been running for many weeks without problems, down to 1 task running- what's up?? | |
ID: 8813 | Rating: 0 | rate: / Reply Quote | |
Posted this this morning: Ok, I have a glimmer, not sure if I got it right ... but let me try to put my limited understanding down on paper and see if one of you chrome domes can straighten me out. | |
ID: 8831 | Rating: 0 | rate: / Reply Quote | |
I've been getting the 'No cuda work requested' messages too. Since it has been days since I got a GPUGrid WU, but SETI-cuda is running fine, I knew my hardware and drivers were okay. | |
ID: 8841 | Rating: 0 | rate: / Reply Quote | |
This 6.6.20 problem may mostly affect people running multiple GPU setups Sadly no on my single gpu system i have the same although my system just ends the unit sends it and then receives a new one or sometimes i receive 4 new which probably get cancelled sooner or later by the server ;) So the issue is more wide spread and seem to affect more projects, but these projects send much more units and/or have longer time to be used or run much longer. That makes them have less problems then the gpugrid project which is time critical. | |
ID: 8847 | Rating: 0 | rate: / Reply Quote | |
I've been getting the 'No cuda work requested' messages too. Since it has been days since I got a GPUGrid WU, but SETI-cuda is running fine, I knew my hardware and drivers were okay. 6.6.20 and above are still works in progress. I did not, and do not think 6.6.20 was ready for prime time. It works, mostly, but it actually does not work as well as 6.5.0 IMO ... especially when you have more than one GPU in the system. When you did a project reset you reset the debt on the one project. The problem is that you did not reset the debts on the other projects. To clear up most of the scheduling problems when you have anomalies like this you need to use the debt reset flag in the cc_config file, stop and restart the client (reading config will not reset debts). Be sure to change the flag back to 0 after you stop and restart. 6.6.23 actually seems to be worse on the debt management. 6.6.24 seems to insist that the number 2 GPU is not like all the others and refuses to use it regardless of how identical it is ... The fix in 6.6.24 to address excessive task switching also did not clear up the problem though it may have addressed a bug that exaggerated the problem (or may have been inconsequential). Waiting on 6.6.25 ... Seriously, if you are having problems with work fetch drop back to 6.5.0 ... the only thing you lose is some debug message improvements and the change to time tracking (you can't see how long the task has to run correctly). | |
ID: 8862 | Rating: 0 | rate: / Reply Quote | |
Thanks for your effort to put this bug report together. If the developers are not totally blind or have put you onto their ignore lists they should be able to see that this is not just chatting, it's a real problem. | |
ID: 8894 | Rating: 0 | rate: / Reply Quote | |
No work sent If you turn on the cc_config flag <sched_op_debug> you will see what its requesting. It is not a bug. BOINC 6.6 series make 2 requests. One for cpu work and one for cuda work. GPUgrid does not have cpu work so when it asks for some you get the message above. It should make another request for cuda work which it can provide. If you have recently upgraded to the 6.6 client I would suggest you reset the debts. This can be done using the cc_config flag <zero_debts> ____________ BOINC blog | |
ID: 8902 | Rating: 0 | rate: / Reply Quote | |
If you have recently upgraded to the 6.6 client I would suggest you reset the debts. This can be done using the cc_config flag <zero_debts> And if you do this, don't forget to remove it after the you restart BOINC. If left in there, the debts would be reset everytime BOINC starts. | |
ID: 8905 | Rating: 0 | rate: / Reply Quote | |
If left in there, the debts would be reset everytime BOINC starts. Which actually might be a good idea. Ignore the debts and treat the resource share as an approximation.. don't stick to the code, they're more like guidelines anyway ;) BOINC 6.6 series make 2 requests. One for cpu work and one for cuda work. GPUgrid does not have cpu work so when it asks for some you get the message above. It should make another request for cuda work which it can provide. Wouldn't that completely screw the scheduling? BOINC would quickly assign a massive debt to CPU-GPU-Grid, which can never be reduced as there is no cou client? Which would in turn screw the scheduling of all other cpu projects? This assumes there are separate debts for cpus and coprocessors. If this is not the case.. well, the entire debt system is screwed anyway and can by definition not work. (.. please don't take this a personal offense, I'm just thinking a little further ahead ;) MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 8917 | Rating: 0 | rate: / Reply Quote | |
This assumes there are separate debts for cpus and coprocessors. If this is not the case.. well, the entire debt system is screwed anyway and can by definition not work. The new debt system is supposed to track two debt levels for each project. The problem is that if you have only one project of one resource class you can and will get unconstrained growth of the debt for that resource. I get if for GPU Grid on my i7 (running 6.6.23 *NOT RECOMMENDED* I do not recommend running 6.6.23 or 6.6.24, NOTE I AM TESTING... and 6.6.23 is PAINFUL for GPU Grid ... YMMV) ... Another participant running GPU Grid and Rosetta@Home, only the two projects gets if for Rosetta ... See the fuller discussion BOINC v6.6.20 scheduler issues (most specifically Message 60808 or the BOINC Alpha and Dev mailing lists for an even fuller discussion. The net effect is that you stop getting a full queue of tasks for the one resource. Sadly, even in the face of providing them lots of logs and other data I am not sure they have even started looking at this problem. The good news is that they are finally starting to take seriously a problem I pointed out when I first saw it about 2005 when I bought my first dual Xeons with HT (the first quad CPU systems) and is now a killer on 8 CPU systems ... especially if you also add in multiple GPUs ... The system I am considering building this summer will be at least an i7 and I hope to put at least 3 GTX295 cards into it ... making it have 8 CPUs and 6 GPUs ... 14 processors ... an alternative is a dual Xeon again ... that would be 16 CPUs and 6 GPUs (or 8 with 4 PCI-e slots) ... that will make the problem I noted a real killer ... | |
ID: 8934 | Rating: 0 | rate: / Reply Quote | |
If left in there, the debts would be reset everytime BOINC starts. Its supposed to maintain 2 sets of debts (ie one for cpu and one for gpu). With projects like Seti which use both types of resource it is useful. GPUgrid causes it grief because it only uses one resource type. There is supposed to be some checking of resources debt growing too much but doesn't seem to work. Then there is the scheduling system, which is where the current discussions are at the moment. I don't quite share Paul's pessimism regarding 6.6.23 (or 6.6.24). It has improved since 6.6.20 but not substantially. Now if they can fix them it could once again become reliable. Don't worry i'm not offended - I didn't write BOINC. Paul and I make suggestions but they usually get ignored by the developers anyway. ____________ BOINC blog | |
ID: 8940 | Rating: 0 | rate: / Reply Quote | |
Then there is the scheduling system, which is where the current discussions are at the moment. I don't quite share Paul's pessimism regarding 6.6.23 (or 6.6.24). It has improved since 6.6.20 but not substantially. Now if they can fix them it could once again become reliable. Um, did not think I was being pessimistic ... I thought rational was more like it ... but Ok ... :) If 6.6.23 or .24 works for you ... cool ... .23 *IS* better than .20 in my opinion, though if you run single project as I do it seems to have the debt problem. If you don't mind resetting debts on occasion then go for it. The main improvements in .23 had to do with initialization crashes and CUDA task switches which were not handled properly. What I saw on .20 was at times the tasks took twice as long to run. Have not seen that at all on .23 ... and I have been running the heck out of .23 on the i7 ... but, 24-48 hours later, I can't get 4 queued tasks from GPU Grid ... reset debts and I am good to go ... In .24 there is a huge mistake of some kind and my second of 4 GPUs is suddenly not the same as the others ... in that it always is the second of teh GPUs, sounds like a bug to me ... not sure where ... I suggested a change to print out the exact error, lets see if they pick that up ... and or find the real problem (I looked and saw nothing that leaped out at me ... but I am not a C programmer). For me to notice someone trying to offend me you have to be at Dick Cheney level of effort to get me to even notice you are trying ... so, I don't do offended... :) And so, one of the reasons I don't understand why others do ... thankfully you don't ... :) Now if others would be so reasonable ... | |
ID: 8942 | Rating: 0 | rate: / Reply Quote | |
Then there is the scheduling system, which is where the current discussions are at the moment. I don't quite share Paul's pessimism regarding 6.6.23 (or 6.6.24). It has improved since 6.6.20 but not substantially. Now if they can fix them it could once again become reliable. I haven't had to reset debts on any of my machines, but I don't run a single project. I usually have 3 (or when Einstein went off last week 4) running. .23 seemed to have fixed the never-ending GPUgrid wu bug. Apart from the debugging messages .24 doesn't seem to correct anything. But then i've only got it installed on a single gpu machine because of the "can't find 2nd gpu" bug. ____________ BOINC blog | |
ID: 8943 | Rating: 0 | rate: / Reply Quote | |
I haven't had to reset debts on any of my machines, but I don't run a single project. I usually have 3 (or when Einstein went off last week 4) running. It is not a problem running a single project, it is running a single project of a particular resource class. And I also think that the speed of the system plays a part in how fast the debts get out of whack. I run 6.6.20 on the Q9300 and it has a single GPU and does not seem to get into trouble that fast. The i7 on the other hand only lasts a day or so before the GPU Grid debt is so out of whack that I have to reset it so that I can keep 4 tasks in the queue. If I don't reset the debts, well, pretty soon all I have is the tasks running on the 4 GPUs. It is possible if I had only one or two GPUs in the system that it would not get out of whack so fast ... but ... The change in .24 was in response to some discussions on the lists about the asymetry of GPUs ... I think the decision was wrong and hope we can get some reasonableness going ... but so far there has been no acknowledgement that this is a bad choice ... hopefully Dr. Korpela at SaH will speak up and the PM types here too ... if they don't the chances of getting the change backed out are lower (Note they can also do silent e-mails directly to Dr. A) ... | |
ID: 8963 | Rating: 0 | rate: / Reply Quote | |
Its supposed to maintain 2 sets of debts (ie one for cpu and one for gpu). With projects like Seti which use both types of resource it is useful. GPUgrid causes it grief because it only uses one resource type. There is supposed to be some checking of resources debt growing too much but doesn't seem to work. Thanks for explaining. Still looks stupid: if someone has a CUDA device and is attached to 50 cpu projects, then 6.6.2x will continue to request GPU work from all of them? I really hope the new versions of the server software feature some flag to tell the clients which work they can expect from them.. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 9020 | Rating: 0 | rate: / Reply Quote | |
Its supposed to maintain 2 sets of debts (ie one for cpu and one for gpu). With projects like Seti which use both types of resource it is useful. GPUgrid causes it grief because it only uses one resource type. There is supposed to be some checking of resources debt growing too much but doesn't seem to work. And that is exactly what happens. In the last 26 hours I have hit the 50 some projects I am attached to with 800 some requests, most of them are probably asking for CUDA work because my GPU debt is high and climbing in that I am only attached to GPU Grid for GPU work. What they are relying on is the "back-off" mechanism with the assumption that the number of requests is nominal. The problem is that DoS is also a very small request, just made lots of times. Multiply my 800 requests times 250,000 participants and pretty soon you are talking some real numbers. I have TWO threads going on the alpha list right now about this type of lunacy where, surprisingly, John McLeod VII is arguing for policies that are a waste of time and lead to system instability because the cost of doing the policy is "low". The trouble is that reality is that the numbers are not as low as he insists that they are ... Worse, the repetitive nature of this obsessive checking (as often as once every 10 seconds or faster) is that the logs get so full of stuff that you cannot find meaningful instances of problems that you are trying to cure. Just because you can do something does not mean that you should. A lesson the BOINC developers have chosen not to learn yet. I would point out that the latest spate of server outages took place shortly after 6.6.20 was made the standard... coincidence? Maybe, maybe not ... But why they are so blase about adding to the loads of the scheduler Is beyond me ... My latest post on DoS and 6.6.20+: Ok, And my prior: Perhaps we should make the flags explicit in the system side revision where: Of course the most depressing thing is that as John explicitly said, if I keep saying things that "they" don't want to hear, "they" are going to keep ignoring me ... my reply was, of course, just because I am saying things that he, and others, might not want to hear does not make me wrong ... nor will ignoring problems make them go away ... | |
ID: 9034 | Rating: 0 | rate: / Reply Quote | |
Wow, now that makes me want to scream. | |
ID: 9082 | Rating: 0 | rate: / Reply Quote | |
Wow, now that makes me want to scream. It does not help me much either. Being suicidally depressed as a normal state with medication not being effective, I really don't need the aggravation. When they were first asking about the 6.6.x to 6.8.x versions we (Richard Hasslegrove, Nicholas Alveres (sp?), and a few others (sorry guys forgot the list) made a lot of suggestions ... as I said before, none of them were considered. Now we see issues with work fetch and resource scheduling. To the point where my system is bordering on chaos ... I cannot imagine what a 16 CPU system with 6 GPU cores will look like. Though there is some glimmer that there is an issue, it is the same, lets tinker with the rules and not make any big changes Sadly, I know, based on experience that this will not work. Yes they may be able to fake it for some more time. BUt it would be better an cleaner to start anew. Theory says that they left room for future GPU and other co-processor types in the mix. Nick does not think that they virtualized enough and though I cannot read the code well enough (I don't do C well, C++ as hacked less well) to know for sure, but, it sure does not look like he is wrong. The issue is that none of them are systems engineers (I was) and don't really consider, or know, issues and charge on with the courage of their convictions that because they can hack together code they know what they are doing. The courage and skill of amateurs. At one point I specialized in database design and most people don't know that there are three types of DBAs or database specialists ... the logical designer that is the one interested in the data life-cycle and data models (thats what I did) and is generally not interested or concerned about speed or efficiency (what I mean is that this is not a primary concern, though you do know what will make the system fast or slow). Completed database models are implemented and tuned by a Systems DBA (a class of DBA most people have never met. There just are not that many of them around.) This guy tunes the hardware and system software (may even select and buy it specifially for the data model to be implemented) and creates things like table spaces and lays the data out on the physical media. Backups and all the system stuff is designed by this guy. The third guy is the type of DBA most people know about. He knows a lot of stuff but is mostly concerned about the day to day operation of the database. Though he may know about making tables and putting them on disks ... well ... it is a art and few do it well ... What is the point of all this? BOINC's database was put together by the third kinda DBA and amateurs ... it is one of the reasons that the databases are so fragile ... and crash so often ... I was doing BOINC while I was still working and showed the data model to a systems DBA I knew and he thought it was as poor of a design as I did ... Anyway, the study of logical database design for relational databases has a point to it ... ignore the "rules" at your peril ... and we can see the result of the choices made ... Anyway, I sent in a pseudo code outline of what I think should be done for Resource Scheduling so we can solve that problem that is coming on 5 years old now ... I will tackle the work fetch and DoS issues 5 years from now when they agree (finally) that it is an issue ... if history is a guide ... RH and I though are trying to bring it up along with other work fetch issues in 6.6.23. 24. and now 25 ... | |
ID: 9092 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : no cuda work requested