Message boards : Graphics cards (GPUs) : Recent problems for WUs on older GPUs
Author | Message |
---|---|
We are having problems with several workunits and GPUs which are NOT 260/275/285/295. As we test on newer cards, we have not spotted the problem before. | |
ID: 9642 | Rating: 0 | rate: / Reply Quote | |
I had a bunch of compute errors on my 8800GT, but then the latest KASHIF_HIVPR completed OK over a couple of days. | |
ID: 9670 | Rating: 0 | rate: / Reply Quote | |
Today the error rate for Kashif wus is lower, so it could have been a problem with drivers. | |
ID: 9673 | Rating: 0 | rate: / Reply Quote | |
2009-05-12 15:08:32 GPUGRID Starting task 4-KASHIF_HIVPRFE_dim_ba1-2-4-RND6858_0 using acemd version 664 | |
ID: 9678 | Rating: 0 | rate: / Reply Quote | |
Today the error rate for Kashif wus is lower, so it could have been a problem with drivers. Or people avoiding them like the plague. People on our team have been reporting stuck and failed WUs like never before. In the next few days we will perform a server update and application updates to use CUDA2.2. Will we still be able to use our older non-CUDA2.2 cards? | |
ID: 9682 | Rating: 0 | rate: / Reply Quote | |
Will we still be able to use our older non-CUDA2.2 cards? That's just the software version and depends on the driver. There's also the CUDA hardware capability, which is the critical one. This one *should* stay as it was before (minimum of 1.1 required). Thomasz, your GTX 260 is not exactly an older card (as stated in the first post of this thread). MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 9687 | Rating: 0 | rate: / Reply Quote | |
It is as clear as crystal ... | |
ID: 9689 | Rating: 0 | rate: / Reply Quote | |
Alright.. could be the *usual* 6.6.20 bug. | |
ID: 9691 | Rating: 0 | rate: / Reply Quote | |
Alright.. could be the *usual* 6.6.20 bug. Sadly I may have seen it on a 6.6.23 processed task. That means that the real problem has not been addressed, though the changes in 6.6.23 and later make it better, but not cured. | |
ID: 9693 | Rating: 0 | rate: / Reply Quote | |
Have 5-KASHIF_HIVPR_dim_ba1-4-100-RND6112_0 using acemd version 664 running since 21 hours on 9800gx2, 68% done, never had such long wu on gpugrid, usually i make like 3/4 wus in 21 hour. Hope credit will be as great as the time it takes to compute ;). | |
ID: 9699 | Rating: 0 | rate: / Reply Quote | |
In the next few days we will perform a server update and application updates to use CUDA2.2. So do we need to upgrade to 185.85 drivers and cuda 2.2 dll's? Or will the app work out which cuda version and only use the instruction set that is supported? Will GPUgrid download the cuda 2.2 dll's or will we need to put them somewhere (like the projects\gpugrid folder) when the new app is released? Oh and seeing as you are changing the app, is there a chance you could report the driver version and the cuda version in the wu info. It might help with the debugging. core_client_version>6.6.28</core_client_version> <![CDATA[ <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTS 250" # Clock rate: 1836000 kilohertz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 16 # Number of cores: 128 # Amber: readparm : Reading parm file parameters # PARM file in AMBER 7 format # Encounter 10-12 H-bond term WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. MDIO ERROR: cannot open file "restart.coor" # Time per step: 46.163 ms # Approximate elapsed time for entire WU: 46163.094 s called boinc_finish </stderr_txt> ]]> ____________ BOINC blog | |
ID: 9701 | Rating: 0 | rate: / Reply Quote | |
Well i downloaded as allways both the driver and the cuda toolkit from nvidia site. | |
ID: 9707 | Rating: 0 | rate: / Reply Quote | |
Today the error rate for Kashif wus is lower, so it could have been a problem with drivers. Just finished looking at a LOT of KASHIF_HIVPR WUs. The situation is not improving at all and is not a driver issue. What happens is these WUs are downloaded and either fail or are aborted repeatedly until they happen to be assigned to a GTX 260 or above, then they complete. The problem is not fixed and is not improving. IMO it needs to be dealt with ASAP. Here's just a few examples: http://www.gpugrid.net/workunit.php?wuid=440561 http://www.gpugrid.net/workunit.php?wuid=442250 http://www.gpugrid.net/workunit.php?wuid=454479 http://www.gpugrid.net/workunit.php?wuid=449101 http://www.gpugrid.net/workunit.php?wuid=457871 http://www.gpugrid.net/workunit.php?wuid=458509 | |
ID: 9711 | Rating: 0 | rate: / Reply Quote | |
2009-05-12 15:08:32 GPUGRID Starting task 4-KASHIF_HIVPRFE_dim_ba1-2-4-RND6858_0 using acemd version 664 whell, now it crunch that WU 18H and it is 83%, it says 3h30min remaining... ____________ POLISH NATIONAL TEAM - Join! Crunch! Win! | |
ID: 9713 | Rating: 0 | rate: / Reply Quote | |
CUDA 2.2 libs will be distributed with the application, but you will need to upgrade the driver to the latest 185 version. | |
ID: 9714 | Rating: 0 | rate: / Reply Quote | |
CUDA 2.2 libs will be distributed with the application, but you will need to upgrade the driver to the latest 185 version. Are you saying that without 185 version drivers we will not be able to successfully do GPU Grid work. I have card/box combinations that will not accept that version and run properly. If 185 version driver and above is "required" to crunch here, I will be taking my farm to FAH. ____________ mike | |
ID: 9715 | Rating: 0 | rate: / Reply Quote | |
The test unit ended also without problem and new ibuchs on the way. | |
ID: 9716 | Rating: 0 | rate: / Reply Quote | |
The test unit ended also without problem and new ibuchs on the way. Your computers are hidden so how can we verify? | |
ID: 9717 | Rating: 0 | rate: / Reply Quote | |
Today the error rate for Kashif wus is lower, so it could have been a problem with drivers. Here's a new KASHIF_HIVPR that was just downloaded to me (and I aborted). Notice that it just caused an error on a GTX 260 {after running a long time I might add). http://www.gpugrid.net/workunit.php?wuid=459189 That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. Take a look for yourself: http://www.gpugrid.net/results.php?hostid=32169 It sure looks like the KASHIF_HIVPR problem also bites the faster cards, just not as often. Our team members have also been reporting the same problem on the GTX 260 and above. So it's documented. Any chance of getting this fixed? | |
ID: 9719 | Rating: 0 | rate: / Reply Quote | |
2009-05-12 15:08:32 GPUGRID Starting task 4-KASHIF_HIVPRFE_dim_ba1-2-4-RND6858_0 using acemd version 664 lol after 24h of crunching - 3600 pionts... ____________ POLISH NATIONAL TEAM - Join! Crunch! Win! | |
ID: 9720 | Rating: 0 | rate: / Reply Quote | |
Yep, the best driver for a 260 is Boinc 6.4.7 and driver 178.28. and cuda 2. Today the error rate for Kashif wus is lower, so it could have been a problem with drivers. ____________ "Silakka" Hello from Turku > Ã…bo. | |
ID: 9721 | Rating: 0 | rate: / Reply Quote | |
Ubuntu 9.04 comes standard with driver version 180.44, which avoids so far to have to fiddle with manual interventions. | |
ID: 9722 | Rating: 0 | rate: / Reply Quote | |
That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. It looks like you have a GTX 260 and an 8800GT. All three tasks failed while running on the 8800GT (device 1), not on the GTX 260. | |
ID: 9723 | Rating: 0 | rate: / Reply Quote | |
In light of the issues with the older GPU's and the KASHIR_HIVPR WU's, what is the best version of nvidia driver to use? | |
ID: 9724 | Rating: 0 | rate: / Reply Quote | |
That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. You're right. Not my machine and I didn't see the 2 cards. But OK here's an example from a machine with only a GTX 260: http://www.gpugrid.net/result.php?resultid=663665 | |
ID: 9725 | Rating: 0 | rate: / Reply Quote | |
That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. :-) That one reports as "Aborted by user". So I don't think it errored out under normal circumstances -- it's was manually aborted. | |
ID: 9727 | Rating: 0 | rate: / Reply Quote | |
CUDA 2.2 libs will be distributed with the application, but you will need to upgrade the driver to the latest 185 version. Is this query unworthy of an answer? ____________ mike | |
ID: 9728 | Rating: 0 | rate: / Reply Quote | |
CUDA 2.2 libs will be distributed with the application, but you will need to upgrade the driver to the latest 185 version. Thanks. I'd suggest a note in the news section on the home page. That way people can start organising things. I have already set GPUgrid to "no new work" so I can finish off what I have before doing the driver upgrades. I've got a few machines to do :) ____________ BOINC blog | |
ID: 9730 | Rating: 0 | rate: / Reply Quote | |
Task ID 665546 had been running well along with another task. As I was about to run a program that would "use" the GPU I decided to suspend all tasks and exit Boinc. Once I had completed my task I launched Boinc, all tasks appeared still suspended. So far so good.I then resumed all tasks, and task 665546 immediately went to "compute error". I also had another task 652947 that had been running for 29 out of about 30 hours and failed (different machine). When I get the time I will compile a list of the failures and successes over the past few days. | |
ID: 9731 | Rating: 0 | rate: / Reply Quote | |
which card/machine combinations are not possible to use the 185.85 version may i ask mike047 ? | |
ID: 9734 | Rating: 0 | rate: / Reply Quote | |
That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. The user is one of my team members and he reported it as being stuck. It had processed for over twice as long as his other WUs and showed no progress. He was using BOINC client v6.6.28, not v6.6.20 so that wasn't the problem. :-) | |
ID: 9736 | Rating: 0 | rate: / Reply Quote | |
That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. Here's a bunch more for your viewing pleasure: http://www.gpugrid.net/result.php?resultid=659111 http://www.gpugrid.net/result.php?resultid=664645 http://www.gpugrid.net/result.php?resultid=666952 http://www.gpugrid.net/result.php?resultid=647270 http://www.gpugrid.net/result.php?resultid=660927 http://www.gpugrid.net/result.php?resultid=666863 Certainly not as common as with the slower cards, but not at all hard to find. The last 2 are test WUs... | |
ID: 9737 | Rating: 0 | rate: / Reply Quote | |
@Beyond - I didn't doubt you. :-) | |
ID: 9738 | Rating: 0 | rate: / Reply Quote | |
As GPUGrid clearly does not want to put in much effort to support 8 and 9 series cards, I'm done here for now. I'd rather shut them down than to waste time and electricity in an endless circle jerk of BOINC versions and drivers. But hey, 3.7 million credits was a good run for me here. There will be a new GPU project out soon. | |
ID: 9739 | Rating: 0 | rate: / Reply Quote | |
which card/machine combinations are not possible to use the 185.85 version may i ask mike047 ? I don't have that information at hand presently. Basically I use Ubuntu 8.04lts. The 260 and 250 cards have no trouble using 180.22 and might be able to use a higher driver without issue. Some of my 8800/9600gso/9800 cards will not accept any driver above 177.82. All mother boards are Gigabyte P35/45. I don't know what the issues are with this project and I am willing "to do" a little work to be able to run this project. BUT, I am unwilling to babysit and periodically change drivers to suit a project that is becoming unwilling to respond to my queries and the queries of others. Unfortunately I have invested in many Nvidia cards that at the present cannot be used else where in Boinc. FAH is the only other place that can use my cards. I have one box working there now and it has run absolutely trouble free with NO intervention on my part. The + to FAH is that my internet is not shut down when it has to upload, the 50+m uploads from here shut my internet down...I know that is not a project fault but it is an issue for me. This is a good project with good science but it has gotten away from communicating with the participants in a timely manners. IMHO the project has slipped badly from where it was several months ago. ____________ mike | |
ID: 9742 | Rating: 0 | rate: / Reply Quote | |
I can confirm my BFG GTX-260 192 Shader card is also getting alot of these errors with 185.81. | |
ID: 9743 | Rating: 0 | rate: / Reply Quote | |
Oh and SETI has nVidia support so there is another BOINC project. | |
ID: 9751 | Rating: 0 | rate: / Reply Quote | |
We have tested with drivers 185.xx on a 8800GT. All the WUs fail. | |
ID: 9752 | Rating: 0 | rate: / Reply Quote | |
The user is one of my team members and he reported it as being stuck. It had processed for over twice as long as his other WUs and showed no progress. He was using BOINC client v6.6.28, not v6.6.20 so that wasn't the problem. :-) Yes, and maybe no ... 6.6.20 stunk in this regard... it really sucked swamp water ... 6.6.23 and later, *I* for one thought, fixed it ... now I am not so sure. What I ***THINK*** happened is that most of the causes have been cleaned up ... but sometimes something bad happens. And THEN, you get a task that runs long. There are still issues with the way that the resource scheduling is done. I am banging my head on the wall about things that *I* think I can clearly demonstrate to be patted on the head and told to go 'way you bother me ... I mean, just last night I had five tasks all started and die in less than a second. At the moment the answer is that this is not possible. My 2,200+ log file of those two seconds notwithstanding ... Anyway, ... I am far less sanguine about how "fixed" we are ... {edit} An example: 12-TONI_HIVPR_mon_ba20-7-100-RND1398_0 and that was run on a 6.6.25 client ... 182.50 drivers I think at the time. 115 ms step size ... | |
ID: 9761 | Rating: 0 | rate: / Reply Quote | |
We have managed to replicate the problem on one of our machines. | |
ID: 9830 | Rating: 0 | rate: / Reply Quote | |
We have managed to replicate the problem on one of our machines. Oh, now we have to be patient too???? :) Its good news GDF ... thanks for the note. | |
ID: 9836 | Rating: 0 | rate: / Reply Quote | |
I worked out the numbers on my computers, they all run 182.50 drivers. | |
ID: 9843 | Rating: 0 | rate: / Reply Quote | |
Hmm i am not convinced its just the drivers i started under win xp with 182.50 driver and boinc 6.6.28 but again i see the ibuch unit hang on 64.688% for more then an hour after 13 hours of calculation. | |
ID: 9844 | Rating: 0 | rate: / Reply Quote | |
My card is an 9800GTX whith 185.82 driver and Boinc 6.6.20. | |
ID: 9849 | Rating: 0 | rate: / Reply Quote | |
Hmm i am not convinced its just the drivers GDF said the problems appear with 185.xx and don't show up with some 180.xx, which apparently noone else is still using. This does not mean that 182.xx is fine and I think the usual "KASHIF_HIVPR" and "IBUCH_KID" problems definitely affect 182.50. It seems to be a problem with the driver, triggered by some new WUs. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 9855 | Rating: 0 | rate: / Reply Quote | |
Hmm i am not convinced its just the drivers Well, I have some of these named tasks running on my 9800GT and the GTX295s ... but they don't seem to want to run on the new GTX260 or my GTX280 ... As far as I know, at the moment I am running 182.50 everywhere ... I suppose I could roll back to the 180.xx to see if I can get a task and if it dies ... heck, nothing else seems to be bothering this problem. | |
ID: 9858 | Rating: 0 | rate: / Reply Quote | |
Sorry, not very specific post. Not all WUs with those names are affected, e.g. see here. "KASHIF_HIVPR_mon" and "KASHIF_HIVPR_dim" have been fine for me. | |
ID: 9860 | Rating: 0 | rate: / Reply Quote | |
Sorry, not very specific post. Not all WUs with those names are affected, e.g. see here. "KASHIF_HIVPR_mon" and "KASHIF_HIVPR_dim" have been fine for me. Well, I just rolled the driver back to 180.4 and still got an invalid function. THe tasks die immediately. gettingevery depressed ... can't tell if it is my new systems or bad tasks ... | |
ID: 9861 | Rating: 0 | rate: / Reply Quote | |
This tasks: p1480000-RAUL_pYEpYI1605-0-10-RND5295_0 started up and I have 5:10 or so on the clock ... so, unlike all the rest, finally got one running. It is running on the new MB, but the old GPU. | |
ID: 9863 | Rating: 0 | rate: / Reply Quote | |
GIANNI_FB's have come in for some flak lately , so thought I would post a successful one as comparator. The stop/starts in there were me, due to non-BOINC related stuff. | |
ID: 9867 | Rating: 0 | rate: / Reply Quote | |
Glad your new rig made it through one WU successfully! The oters don't look too well, though. They error on most other hosts as well, but 3 have been finished by other GT200 cards. One of them uses 185.85, but I can't see the others. | |
ID: 9878 | Rating: 0 | rate: / Reply Quote | |
GIANNI_FB's have come in for some flak lately , so thought I would post a successful one as comparator. The stop/starts in there were me, due to non-BOINC related stuff. And here's a 205-GIANNI_FB that failed on the same machine after running a LONG time: http://www.gpugrid.net/result.php?resultid=677771 | |
ID: 9883 | Rating: 0 | rate: / Reply Quote | |
Glad your new rig made it through one WU successfully! The oters don't look too well, though. They error on most other hosts as well, but 3 have been finished by other GT200 cards. One of them uses 185.85, but I can't see the others. I think I had TWO problems, one was OC got turned on by mistake and the automatic mode OC probably tried to do too much. What it broke is not entirely clear to me. It may also have been the BIOS ... I flashed that with the latest and turned off the OC mode at the same time so it is hard to know which it was. The second problem was of course the bad tasks which would have failed with the other error messages if I had not had problem one on both rigs. Now I am running into power limits (again) ... I really gotta call that electrician to change my old 3 phase 230 V UPS socket into a 30A 115 supply... | |
ID: 9891 | Rating: 0 | rate: / Reply Quote | |
The similar for me, | |
ID: 9896 | Rating: 0 | rate: / Reply Quote | |
The similar for me, I don't understand ... you don't like valid tasks? | |
ID: 9897 | Rating: 0 | rate: / Reply Quote | |
Hello, | |
ID: 9898 | Rating: 0 | rate: / Reply Quote | |
I maybe lucky but I am having very few problems. 185.85 drivers, XP64, 2 EVGA 260s, Boinc 6.6.28 I had 1 compute error yesterday but that was my fault for suspending right as it started and unsuspending a couple of seconds later, and a couple of others that everyone else in the quorum errored out on. | |
ID: 9899 | Rating: 0 | rate: / Reply Quote | |
Now i have been able to save a few hanging units | |
ID: 9900 | Rating: 0 | rate: / Reply Quote | |
Now I have the fourth error WU in a row. :-((( | |
ID: 9901 | Rating: 0 | rate: / Reply Quote | |
Nowi, I really gotta call that electrician to change my old 3 phase 230 V UPS socket into a 30A 115 supply... Do you think that's a good idea? I don't know your 230V, but at 115V the power supplies loose efficiency compared to 230V. 30A @ 115V is 3.5kW, quite massive :D I know we can draw at least 2kW over the regular 230V, whereas I heard the US net may deliver something around 1.5kW at 110V. Our 3 phase plugs are 380V and I think you can get 5 - 6 kW from them.. but you're not talking about these, right? MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 9903 | Rating: 0 | rate: / Reply Quote | |
Sadly i was not paying attention so the last one did error out again,but to be honest i was expecting it to fail also since i had to restart it 3 time in a row to start seeing progress. | |
ID: 9906 | Rating: 0 | rate: / Reply Quote | |
Thank you for link. I'opened the Web page of my pc , as for GPU 260GTX of this pc's I am with boinc 6.6.20 who satisfied me and Nvidia 182.08 on Win Xp pro64. | |
ID: 9908 | Rating: 0 | rate: / Reply Quote | |
Drivers Nvidia 1XX.XX http://www.nvidia.fr/Download/Find.aspx?lang=fr | |
ID: 9909 | Rating: 0 | rate: / Reply Quote | |
I am with boinc 6.6.20 who satisfied me Except for the fact that some of your tasks take longer than they should? MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 9910 | Rating: 0 | rate: / Reply Quote | |
I really gotta call that electrician to change my old 3 phase 230 V UPS socket into a 30A 115 supply... Yes it does, the problem is that to get a 230V UPS is about twice as expensive as a normal one ... the lat time I looked to get one about the size I would need would be about 3K ... The problem is that I can tell that I am pulling way high on the circuits in use ... if I change to another dedicated line, well, then I can leave some on the current room sockets and the rest on the dedicated line. The only point of the exercise is to get more power to the room ... I think adding new GPUs is pushing me up to the line again ... at least I got rid of the power hungry systems that were slower than dirt. In a month or so I will likely get an upgrade card to replace the 9800GT though I will likely keep it in the closet for that time when I upgrade to wider MB and might need a slot filler ... | |
ID: 9912 | Rating: 0 | rate: / Reply Quote | |
OK, except cost there's nothing to argue against a dedicated line :) | |
ID: 9915 | Rating: 0 | rate: / Reply Quote | |
Yes really 84000s instead of 42000s for 14-KASHIF_HIVPR_dim_ba3-8-100-RND7871_1 | |
ID: 9918 | Rating: 0 | rate: / Reply Quote | |
OK, to put it more clear: you don't like the long runtime, but you say 6.6.18/20 satisfied you. The post I linked to says that the long runtime is caused by an error in 6.6.20 and some previous clients. So something doesn't add up and you may want to up-/ or downgrade ;) | |
ID: 9921 | Rating: 0 | rate: / Reply Quote | |
I am crossed has 6.6.28 boinc It is possible that Seti beta is responsble of this probleme. Thanks for your help, I keep posted PS3GRID about suite. | |
ID: 9928 | Rating: 0 | rate: / Reply Quote | |
I rolled back my drivers from 185.85 to 182.50. With windows Vista 64 bit, Boinc client 6.6.28. Since which I have returned three successfull results, one of which had run for 30 hours on one core of my 9800 GX2 and gave me just over 10,000 credits :-) | |
ID: 9932 | Rating: 0 | rate: / Reply Quote | |
I am finding it hard to tell what is going on... I seem to be getting tasks our of order so that they don't sort well on the results pages. As I watch the computers they seem to be returning mostly good results ... with occasional errors. | |
ID: 9933 | Rating: 0 | rate: / Reply Quote | |
I still cannot make heads nor tails of the pattern of errors. One of the problems of course is the difficulty of gathering data about the failures. | |
ID: 9937 | Rating: 0 | rate: / Reply Quote | |
I upgraded just the drivers on all my machines to 185.85. I had a couple of machines start getting errors. | |
ID: 9938 | Rating: 0 | rate: / Reply Quote | |
I am crossed has 6.6.28 boinc It is possible that Seti beta is responsble of this probleme. Thanks for your help, I keep posted PS3GRID about suite. Believe me you don't want to run seti beta together with other projects. Seti itself has been crashing my gpugrid units also but sometimes runned without problems seti seems only to use cuda 1.0 instructions with no optimisations if you don't use the optimized ones. The optimized kwsn application has caused me failures on gpugrid as well. But thats probably because seti was being running together in the same time as gpugrid while i have only 1 cuda device. I advice you not to use seti and gpugrid at the same time it has been known to me to crash many units. Although sometimes it looks like nothing is wrong i found some units keep the memory locked so when some units are finished the ram is not released properly causing other projects (gpugrid) to error out. Another one which is gonna give you problems together with gpugrid can be CPDN which has units which eat up to at least 1,5 GB memory, so that meant for me 4 units with 1,5 GB minimal gave me a load of 7,2 GB ram memory being used :D Now believe me that makes trouble, if i had booted under win 64 i prolly could run them since i have 8 gb memory. But since i run 32 bits windows it only uses 3,2 GB. Have anyone tried to use updated dll's Believe me i tried all combinations of drivers, boinc and cuda versions. Everytime same result in the end some units simply crash, even when babysitting them they seem to know when i am busy doing other tasks and crash ;) So it looks to me that if a unit gets locked it will die if you are not in time to pause and restart the unit to work. I mean by that: The unit is locked at x,xxx % for a at least an hour if it does move the % you can try the pause/restart trick but some units will still crash no matter what i do. Now make sure not to restart it too quick after is started again because that will surely crash it also !! | |
ID: 9942 | Rating: 0 | rate: / Reply Quote | |
we are running this set of workunits called | |
ID: 9958 | Rating: 0 | rate: / Reply Quote | |
The CPDN memory limit of 1.5Gb is set that way to allow for four running on a quad, and enough left over for op sys etc within the quoted figure of 1.5Gb. Each of the larger CPDN WUs takes up 210-220Mb in memory, therefore four of them will eat around 850Mb, with a comfortable margin for opsys etc, within the stated 1.5Gb. Its not 1.5Gb each WU, that figure they state as advisory, is total memory on the PC, not per WU. | |
ID: 9961 | Rating: 0 | rate: / Reply Quote | |
We should be able to test a fix by tomorrow. | |
ID: 9964 | Rating: 0 | rate: / Reply Quote | |
Zydor did you actually select the big units on your account page since by default they are not loaded, you really should read what it is stated there. | |
ID: 9967 | Rating: 0 | rate: / Reply Quote | |
I just had a case where a suspended CPDN task caused two GPU Grid tasks to go into waiting for memory state. I had to stop BOINC and restart it so that the CPDN task (only 300K) would be swapped out ... | |
ID: 9969 | Rating: 0 | rate: / Reply Quote | |
Task Manager reported the larger ones taking up 220Mb min and did go to 400Mb at times, and four did run fine. You can get four by setting preferences for only those units. | |
ID: 9981 | Rating: 0 | rate: / Reply Quote | |
So, it seems that there is a bug in the compiler/hardware which appears only on pre G200 cards. | |
ID: 9991 | Rating: 0 | rate: / Reply Quote | |
So, it seems that there is a bug in the compiler/hardware which appears only on pre G200 cards. Well at least it isn't a mystery any more. When you are on the bleeding edge, one should expect some cuts. Hope it gets resolved in the not too distant future. Rob | |
ID: 9992 | Rating: 0 | rate: / Reply Quote | |
So, it seems that there is a bug in the compiler/hardware which appears only on pre G200 cards. How come the G200 based cards also get failures? Will there be an updated app for the non-G200 machines, or perhaps all machines? Will this be a cuda 2.2 app or stick with the old version for the time being? Can we use the 185.85 drivers now or with the new app (assuming there will be one)? ____________ BOINC blog | |
ID: 9995 | Rating: 0 | rate: / Reply Quote | |
I am running an 8800GT and 3 9800Gtx+ cards. I have had zero complete Kashif Wu's . | |
ID: 9998 | Rating: 0 | rate: / Reply Quote | |
we are running this set of workunits called These hang on my Pent D 830 with a 9600GSO. See here for a hung result that was aborted after more than 24 hours of no progress (hung at about 21%). | |
ID: 10000 | Rating: 0 | rate: / Reply Quote | |
Just as a data point of reference, I have had 100% success on all work units using a GTX 260 Core 216 card, running CUDA 2.2 and 185.85 driver, even on work units that have had failures previously. | |
ID: 10002 | Rating: 0 | rate: / Reply Quote | |
There is another way ..... | |
ID: 10004 | Rating: 0 | rate: / Reply Quote | |
There is another way ..... Believe me, I am tapping my heels that my card continues to function as well as it has! I just bought it, so it would be a big disappointment if there were issues so soon....but the issue fix would appear to be possible without a card upgrade, hopefully....(although NVIDIA I would guess is tapping its heels that many will upgrade...ouch! ;) | |
ID: 10005 | Rating: 0 | rate: / Reply Quote | |
How come the G200 based cards also get failures? Let's see what the fix can do and who still gets failures afterwards. Mind you, there's also the "regular failure rate", some kind of "noise floor" which affects all cards. Will there be an updated app for the non-G200 machines, or perhaps all machines? Will this be a cuda 2.2 app or stick with the old version for the time being? Not speaking officially, but I wouldn't rush to introduce another variable in the current situation. Wait until the dust settles and we're confident that the problems have been solved. 185.66 has been running fine for me with non-troublemaker WUs, so I'll keep using it until I see problems. I do have a WU issued today and it appears to use client 6.64, so it may look like no new app for now. But this could be tied to an old type of WU as well. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 10020 | Rating: 0 | rate: / Reply Quote | |
Just had a Kashif go bang | |
ID: 10071 | Rating: 0 | rate: / Reply Quote | |
Looks like the old problem and the WU was created past 20 May 16:44 CEST, when the fix was applied. I think it would be better to post such observations in the new thread, so they don't get lost. | |
ID: 10088 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : Recent problems for WUs on older GPUs