Message boards : Number crunching : About the GERARD_A2AR batch
Author | Message |
---|---|
I accidentally did a test with my i7-4790k + GTX 980Ti + WinXP x64 host. | |
ID: 42683 | Rating: 0 | rate: / Reply Quote | |
I had a few WU take much longer than average a few days ago, with no system changes. Got a regular amount of cobblestones, so I assume the length was unintentional. Back to normal now. | |
ID: 42687 | Rating: 0 | rate: / Reply Quote | |
I accidentally did a test with my i7-4790k + GTX 980Ti + WinXP x64 host. Compared it to my AMD Athlon(tm) 64 X2 Dual Core Processor 5000+, with a GTX 980Ti, and windows xp x32, the results are as follows: 1. GPU usage is under 70% and temperature is about 50C, which compares to other GERARD WUs about 95% usage at about 60C temperature. 2. Device clock : 1190MHz Memory clock : 3505MHz 3. Work unit run time is over 12 hours, compared to about 6 hours (plus or minus) for the other GERARD WUs. Even having several of these slow WUs in my average, my average completion time is currently in 2nd place: Rank User name Average (h) Total crunched (WU) GPU description of fastest setup 1 BurningToad 6.23333 12 NVIDIA GeForce GTX 980 Ti (4095MB) driver: 355.98 2 Bedrich Hajek 6.37091 55 NVIDIA GeForce GTX 980 Ti (4095MB) driver: 355.82 3 Xeaon 6.77059 17 NVIDIA GeForce GTX 980 Ti (4095MB) driver: 361.43 4 Streetlight 6.95152 33 [2] NVIDIA GeForce GTX TITAN X (4095MB) driver: 358.50 5 syntech 6.98824 17 NVIDIA GeForce GTX 980 Ti (4095MB) driver: 358.91 6 Retvari Zoltan 7.02595 185 NVIDIA GeForce GTX 980 Ti (4095MB) driver: 358.50 7 Gamekiller 7.03636 11 NVIDIA GeForce GTX 980 Ti (4095MB) driver: 361.43 8 whizbang 7.81600 25 [2] NVIDIA GeForce GTX 980 (4095MB) driver: 361.43 9 Kagura Kagami@jisaku 7.81818 11 NVIDIA GeForce GTX 980 Ti (4095MB) driver: 361.43 10 Andree Jacobson 7.89167 12 NVIDIA GeForce GTX 980 (4095MB) These results confirm that old MB & CPU should not be used with high-end GPUs to avoid such frustration by similar workunits in the future. Yes and no. Yes, it is frustrating. But why shouldn't high end cards be used in older computer? I am getting run times on the other GERARD WUs that are comparable to my new windows 10 computer. This brings up another question about future WUs. Should they be more or less depend on CPUs? Even fast computer pay a time penalty with the A2AR WUs, though it is much less than the older CPUs. If we want to have more efficient (faster) crunching, the WUs should be made less CPU depend, where possibly. As far WDDM penalty, which happens (correct me if I am wrong) when the GPU has to access the CPU after each step. Is it possibly to be able to have this access every other step or possibly every third? This should reduce the lag. I am not sure I have the right wording for this, but I hope you can understand what I am saying. | |
ID: 42688 | Rating: 0 | rate: / Reply Quote | |
How old was the motherboard running the 4790k?It's in a Gigabyte GA-Z87X-OC motherboard, I don't know the exact age of this motherboard, but the Z87 chipset is almost 3 years old. This is not too old for a GTX 980 Ti. But when its PCIe bus was limited to 4x, it was acting like a really old motherboard with PCIe 1.0 x16. | |
ID: 42689 | Rating: 0 | rate: / Reply Quote | |
Even having several of these slow WUs in my average, my average completion time is currently in 2nd placeThe present mix of workunits is easier on the CPU, but previously there were large batches (for example the NOELIA tasks) which were like the GERARD_A2AR batch now. This is merely a forewarning to avoid the frustration could be caused by a large CPU demanding batch.These results confirm that old MB & CPU should not be used with high-end GPUs to avoid such frustration by similar workunits in the future.Yes and no. Yes, it is frustrating. But why shouldn't high end cards be used in older computer? I am getting run times on the other GERARD WUs that are comparable to my new windows 10 computer.True. But this thread is about the GERARD_A2AR batch, of which runtimes are ~70% longer on your older Athlon/WinXP host than on your i7-5820K/Win10. To put it in an even worse perspective: your older host's GERARD_A2AR runtimes are ~100% longer than of my WinXP/i3-4130 host. This brings up another question about future WUs. Should they be more or less depend on CPUs?Now that's the million dollar question. Even fast computer pay a time penalty with the A2AR WUs, though it is much less than the older CPUs. If we want to have more efficient (faster) crunching, the WUs should be made less CPU depend, where possibly.I think it's impossible from computing point of view. (Then we should use Double Precision enabled GPUs, which are very very expensive.) | |
ID: 42690 | Rating: 0 | rate: / Reply Quote | |
(Then we should use Double Precision enabled GPUs, which are very very expensive.) If you can tolerate 1/4 DFP performance which is significantly better than 1/24 or 1/32 on crippled cards, then the only reasonable choice is the 7970 or 280x AMD card. It is the only one that produces high output in Milkyway that is specifically programmed for double precision floating point. These cards go for less than $200 on the used market. My computer recently downloaded a work-unit than normally take 9-10 hours on my GTX 970, and to my astonishment BOINC tells me that it will finish in 1d 03:16:51. It is the new chalcone229x2-GERARD_CXCL12_DCKCHALK. Is anyone else seeing this? It not a slow machine, pcie-e 3.0 is running at 8x and cpu is an i7-3770k overclocked to 4.4GHz. Is the credit commensurate with the long compute time? Not trying to hijack this thread. | |
ID: 42709 | Rating: 0 | rate: / Reply Quote | |
If the Work Units are truly CPU limited on a 4790K, then you should see an increase in performance if you disable hyperthreading in the bios. The app would then have access to one full core rather than a single logical core. | |
ID: 42710 | Rating: 0 | rate: / Reply Quote | |
If the Work Units are truly CPU limited on a 4790K, then you should see an increase in performance if you disable hyperthreading in the bios. The app would then have access to one full core rather than a single logical core.It's not CPU limited. It was PCIe bandwidth limited when the CPU's PCIe bus was running at 4x speed by accident. | |
ID: 42711 | Rating: 0 | rate: / Reply Quote | |
Actually there's a complete thread for a similar batch in the news topic, started by Gerard himself :) Gerard wrote: I forgot to note that due to the nature of these simulations, some small forces have to be added externally and, unfortunatenly, these have to be calculated using cpu instead of gpu. Therefore you may notice some amount of cpu usage that in my case never surpassed a 10%. | |
ID: 42712 | Rating: 0 | rate: / Reply Quote | |
Good stuff, This is the sort of thing LinusTechTips over on Youtube does once in a while, how does it fare at 8x? | |
ID: 42714 | Rating: 0 | rate: / Reply Quote | |
My computer recently downloaded a work-unit than normally take 9-10 hours on my GTX 970, and to my astonishment BOINC tells me that it will finish in 1d 03:16:51. It is the new chalcone229x2-GERARD_CXCL12_DCKCHALK. Is anyone else seeing this? I got a couple of those earlier this week. They started off with an estimate of a day and a half but finished in 10 hrs. How long did yours take? The other question raised is one I have been pondering lately about how does PCIe bandwidth affect performance. The general "hive consensus" to date has been that projects to date do not more than what is provided by PCIe1 x16. This was extensively tested by the bitcoin community. It is highly dependent on the project and workloads so different projects will have different requirements. My current generation of builds expect that a PCIe2 x8 slot can keep a GPU happy. This thread is starting to make me wonder if this is true. The cost of a system with 32 PCIe lanes and enough power to run two modern GPUs exceeds the cost of two basic systems with 16 PCIe lanes each and a modest PSU by a significant number. A hundred dollar bundle with a thirty dollar PSU will easily handle any single GPU. A system with 32 PCIe lanes on two x16 slots is easily a two hundred dollar motherboard plus memory and cpu and a hundred dollar PSU. Add onto that the issues cooling a system with dual 300 watt CPUs. So, it used to be we had to shop for motherboards that ran x16 single slot but dropped to x8/x8 if you added a second card instead of motherboards that ran x16 single slot and dropped to x16/x4 if you added a second card. Have we entered an era where x8/x8 isn't fast enough any more? | |
ID: 42717 | Rating: 0 | rate: / Reply Quote | |
Have we entered an era where x8/x8 isn't fast enough any more?I wouldn't say that. WU batches come and go, some of them (for example the one this thread is about) are more CPU/PCIe bandwidth dependent. As a performance enthusiast I don't like to make compromises, so I wouldn't build a dual (multi-) GPU host for GPUGrid (though I have some). There's no point in spending the extra bucks for a more capable (s2011) MB and CPU (and cooling), if you have the space (and the will) to build two (or more) hosts and you don't want to see your host on the hosts' overall & RAC toplist. | |
ID: 42719 | Rating: 0 | rate: / Reply Quote | |
This brings up another question about future WUs. Should they be more or less depend on CPUs?Now that's the million dollar question. If WUs do become more CPU dependent, then having 2 or more CPUs feeding 1 GPU should offset this lag, unless this increases the PCIe bus traffic dramatically. This should also reduce the WDDM lag. Can this be done? If yes, how? | |
ID: 42727 | Rating: 0 | rate: / Reply Quote | |
The science will determine the reliance on the CPU. That said, this is a GPU project and tries to do most of the work on the GPU, and the project is designed to utilize gaming GPU's. | |
ID: 42758 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : About the GERARD_A2AR batch