Message boards : Graphics cards (GPUs) : Advice for GPU placement
Author | Message |
---|---|
Hey folks, | |
ID: 51217 | Rating: 0 | rate: / Reply Quote | |
Also consider CUDA GPU Capability: | |
ID: 51218 | Rating: 0 | rate: / Reply Quote | |
Power supply/cooling capability/spacing would be a bigger concern for me if that was my setup. | |
ID: 51221 | Rating: 0 | rate: / Reply Quote | |
Hmm, I thought that wouldn't work because BOINC reports to the projects the "biggest card", so GPUGrid would think I have 3 2080 GPUs and thus wouldn't give me any work. Are you sure your proposal would work and still allow me to do BOINC work from other projects on the 2080? If so, how? | |
ID: 51222 | Rating: 0 | rate: / Reply Quote | |
Do you have this line in your cc_config.xml ??? <use_all_gpus>1</use_all_gpus> If 1, use all GPUs (otherwise only the most capable ones are used). Requires a client restart. | |
ID: 51224 | Rating: 0 | rate: / Reply Quote | |
Hmm, I thought that wouldn't work because BOINC reports to the projects the "biggest card", so GPUGrid would think I have 3 2080 GPUs and thus wouldn't give me any work.Actually GPUGrid gives work to 20x0 cards, but it will fail on the host, because the app does not contain the code for CC7.5 (>6.1) cards. Are you sure your proposal would work and still allow me to do BOINC work from other projects on the 2080?Yes. If so, how?You should put the following to the <options> section in your c:\ProgramData\BOINC\cc_config.xml: <exclude_gpu>
<url>http://gpugrid.net</url>
<device_num>0</device_num>
</exclude_gpu> You should check the device number of your GTX 2080 in the first ~20 lines of the BOINC client's event log, and put that number there (I guessed that it will be device 0). | |
ID: 51225 | Rating: 0 | rate: / Reply Quote | |
Hmm, I thought that wouldn't work because BOINC reports to the projects the "biggest card", so GPUGrid would think I have 3 2080 GPUs and thus wouldn't give me any work. Are you sure your proposal would work and still allow me to do BOINC work from other projects on the 2080? If so, how? Yes, the BOINC exclusion goes by Vendor and GPU index. If only one vendor then just the index as Retvari Zoltan has suggested. It doesn't care what card it is. I have excluded a 980Ti in a system (running FAH) and allowed just the 970 to crunch on several projects. | |
ID: 51226 | Rating: 0 | rate: / Reply Quote | |
:) Did you guys know that I'm responsible for <exclude_gpu> being included in BOINC? I know how it works and how to use it. | |
ID: 51227 | Rating: 0 | rate: / Reply Quote | |
I'm currently employing gpu_exclude statements for both GPUGrid and Einstein for one of my 4 card hosts with a RTX 2080. Works fine preventing that card from being used. | |
ID: 51230 | Rating: 0 | rate: / Reply Quote | |
Why do you have an exclude for Einstein? | |
ID: 51233 | Rating: 0 | rate: / Reply Quote | |
Why do you have an exclude for Einstein? Certain types of GPU work on Einstein fail. I believe short running work units do fine, the long running fail. Might be reverse. Anyway, the terminology they use to describe the work unit (as defined by the users not the scientist) is wrong nomenclature. So I stopped paying attention to the discussion. Keith can fill you in on the specifics. ____________ | |
ID: 51240 | Rating: 0 | rate: / Reply Quote | |
Thanks. I confirmed that at least one of the Einstein task types failed immediately on my 2080, so I also added them to my GPU Exclusion list for that GPU. Pity, really. | |
ID: 51241 | Rating: 0 | rate: / Reply Quote | |
:) Did you guys know that I'm responsible for <exclude_gpu> being included in BOINC? I know how it works and how to use it. I remember this. I'd like to thank you as I use <exclude_gpu> extensively. Some machines are running GPU Grid, Amicable Numbers and Enigma on dedicated GPUs. Also thanks for all the great work that you do debugging BOINC and testing new features. | |
ID: 51244 | Rating: 0 | rate: / Reply Quote | |
:) I'm a rock star at breaking things, for sure! | |
ID: 51245 | Rating: 0 | rate: / Reply Quote | |
Power supply/cooling capability/spacing would be a bigger concern for me if that was my setup. Cooling is a major consideration in GPU placement for me. All my Ryzen 7 boxes are running 3 GPUs, usually 3 x 1060 cards. The top GPU is flexible, the middle GPU is a blower, and the bottom card that sits up against the blower is a short card that leaves the blower fan uncovered. If the machine still runs hotter than I like, I sometimes put a 1050ti in the lower position. Another consideration is bus width. On X370/X470 boards for instance the 2 top PCIe slots run at PCIe 3.0 x8 (if both are used). The bottom slot is PCIe 2.0 x4. The bottom slot handles a 1060 at full speed on a Ryzen 7, but not always on machines with slower processors. For example I have an ITX box with a Celeron and PCI 2.0 x4 and it constricts a 1060 but a 1050ti runs at full speed. BTW, the Ryzen 7 machines use far less CPU to run 3 1060 cards at full blast. My slower boxes take a lot more CPU allocation to run 2 1060 cards than the Ryzens do to handle 3. In this regard I've also found that SWAN_SYNC helps noticeably on all my machines except for the Ryzens, which seem to feed the GPUs fully without SWAN_SYNC. BTW, the new Ryzens coming out mid year will be PCIe 4.0, so again double the speed of PCIe 3.0. You'll need a 500 series MB for PCIe 4.0, on the older boards they'll still run at PCIe 3.0. Of course power/watt ratio is another major consideration. I recently retired my pre 10xx NV GPUs. The 750ti cards use about the same power as a 1050ti but the 1050ti is ~60% faster (it depends somewhat on the project). My 670 is still viable (24hr deadline-wise) but I replaced it because it's slower than a 1060 and uses much more power and much more heat. You might find that good used GPUs are selling inexpensively now as disillusioned miners seem to be fleeing the mines. Perhaps black lung disease? ;-) | |
ID: 51248 | Rating: 0 | rate: / Reply Quote | |
:) | |
ID: 51250 | Rating: 0 | rate: / Reply Quote | |
There's something to be said for white noise... | |
ID: 51253 | Rating: 0 | rate: / Reply Quote | |
Power connectors. | |
ID: 51257 | Rating: 0 | rate: / Reply Quote | |
:) I'm a rock star at breaking things, for sure! I about to join you as a "rock star" for breaking things to apparently. The client code commit that DA wrote to fix my original problem is going to cause major problems for anyone using a max_concurrent or project_max_concurrent statement. The unintended consequence of the code change prevents requesting work fetch task replacement until the hosts cache is empty. Only then does the host report all finished work and then asks for work to refill the cache. So the end of keeping your cache topped up at every 5 minute scheduler connection. The PR2918 commit is close to be being accepted into the master branch. I have voiced my displeasure but since only DA usually authorizes pull requests into the master branch, that decision is up to him. Richard Haselgrove also has voiced his concerns. | |
ID: 51264 | Rating: 0 | rate: / Reply Quote | |
BTW, the new Ryzens coming out mid year will be PCIe 4.0, so again double the speed of PCIe 3.0. You'll need a 500 series MB for PCIe 4.0, on the older boards they'll still run at PCIe 3.0. There's talk from CES that the PCIe 4.0 spec cards would still work in the first PCIe slot closest to the cpu on the existing X370/X470 motherboards as the signaling requirements for PCIe 4.0 devices limits the signal path to 6 inches without redrivers. | |
ID: 51265 | Rating: 0 | rate: / Reply Quote | |
:) I'm a rock star at breaking things, for sure! It sounds like you were using "max concurrent" to mean "only run this many at the same time, but allow fetch of more." David is likely arguing that, if you can't run more than that many simultaneously, then why buffer more? Consider tasks that take 300 days to complete (yes, RNA World has them). If you're set to only run 3 as "max concurrent", then why would you want to get a 4th task that would sit there for 300 days? You might consider asking for a separation of functionality --- "max_concurrent_to_schedule" [which is what you want] vs "max_concurrent_to_fetch" [which is what David is changing max_concurrent to mean]. Then you could set the first one to a value, and leave the second one unbound, and get back your desired behavior. I hope this makes sense to you. Please feel free to add the text/info to the PR. Note: I doubt it waits until the cache is completely exhausted of max_blah items before asking more. I'm betting, instead, that work fetch will still top off, even if you have some of that task type, but only up to the max_blah setting. | |
ID: 51266 | Rating: 0 | rate: / Reply Quote | |
:) I'm a rock star at breaking things, for sure! Yes, I guess I generalized. I didn't wait to see if all my 1000 tasks finished before the work request was initiated. From the testing by Richard and in the host emulator, I assume that when the number of tasks fell below my <project_max_concurrent>16<project_max_concurrent> statement that the client would finally report all 485 completed tasks and finally ask for more work. But from the client configuration document https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration the intended purpose of max_concurrent and project_max_concurrent is: max_concurrent and: project_max_concurrent The original purpose of the max_concurrent parameters shouldn't be circumvented by the new commit code. The key point that needs to be emphasized is run at a given time and number of running jobs | |
ID: 51267 | Rating: 0 | rate: / Reply Quote | |
:) I'm a rock star at breaking things, for sure! Yes, I guess I generalized. I didn't wait to see if all my 1000 tasks finished before the work request was initiated. From the testing by Richard and in the host emulator, I assume that when the number of tasks fell below my <project_max_concurrent>16<project_max_concurrent> statement that the client would finally report all 485 completed tasks and finally ask for more work. But from the client configuration document https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration the intended purpose of max_concurrent and project_max_concurrent is: max_concurrent and: project_max_concurrent The original purpose of the max_concurrent parameters shouldn't be circumvented by the new commit code. The key point that needs to be emphasized is run at a given time and number of running jobs | |
ID: 51268 | Rating: 0 | rate: / Reply Quote | |
Thanks for the comments Jacob. I have added your observations to my post and will await what Richard has to say about your new classifications when the new day for him begins. | |
ID: 51269 | Rating: 0 | rate: / Reply Quote | |
You're welcome. Since max_concurrent documentation has a 'documented meaning' already, you might use that to suggest (gently push) toward keeping it the same, and putting any work fetch max limits into a new variable. I could see it ending up that way maybe. | |
ID: 51270 | Rating: 0 | rate: / Reply Quote | |
Hi Keith and Jacob! | |
ID: 51274 | Rating: 0 | rate: / Reply Quote | |
Good morning Richard, sorry about the location of this discussion. Jacob has provided some useful information to enhance my understanding. Not fetching work if max_concurrent is reached is the intended behavior. and that is what alarmed me greatly. Very encouraging to hear that your conference call with the other developers also mentioned their concerns with work fetching. I was getting overly worked up I guess in thinking the commit was going to master soon with the unintended consequence of breaking work fetch. I thought that would cause massive complaints from everyone who would notice that they didn't maintain their caches at all times. Thanks for clarification that even David has to have consensus from the other developers to merge code into master. | |
ID: 51280 | Rating: 0 | rate: / Reply Quote | |
I've never liked using those 2 commands when trying to limit a project to something like 50% of CPU threads awhile wanting another project to use the other 50%. I would many times end up with a full queue of the project using max # of tasks command and the other threads would be idle as the queue would be full. Another situation where the task run priority should be separate from the work download priority and another BOINC client instance ends up being the favorable way to finely tune BOINC management on a PC. | |
ID: 51282 | Rating: 0 | rate: / Reply Quote | |
Are you guys sure that GTX 2080 GPUs get work from GPU Grid? | |
ID: 51440 | Rating: 0 | rate: / Reply Quote | |
The only 2080s at this time that should get WUs are the ones in the same machine as a 1000 series card or below. The WU will not yet work on 2000 series cards | |
ID: 51441 | Rating: 0 | rate: / Reply Quote | |
My machine has 2080, 980 Ti, 980 Ti. And I have a GPU Exclusion setup so GPUGrid work doesn't run work on the 2080. | |
ID: 51442 | Rating: 0 | rate: / Reply Quote | |
My machine has 2080, 980 Ti, 980 Ti. And I have a GPU Exclusion setup so GPUGrid work doesn't run work on the 2080. Do you have a device_num with your project URL exclusion? Otherwise all GPUs will be excluded instead of just the Turing card. <device_num>0</device_num> | |
ID: 51445 | Rating: 0 | rate: / Reply Quote | |
Yes, I have that set correctly. 2/7/2019 12:12:35 PM | | Starting BOINC client version 7.14.2 for windows_x86_64 2/7/2019 12:12:35 PM | | log flags: file_xfer, sched_ops, task, scrsave_debug, unparsed_xml 2/7/2019 12:12:35 PM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8 2/7/2019 12:12:35 PM | | Data directory: E:\BOINC Data 2/7/2019 12:12:35 PM | | Running under account jacob 2/7/2019 12:12:35 PM | | CUDA: NVIDIA GPU 0: GeForce RTX 2080 (driver version 418.81, CUDA version 10.1, compute capability 7.5, 4096MB, 3551MB available, 10687 GFLOPS peak) 2/7/2019 12:12:35 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 980 Ti (driver version 418.81, CUDA version 10.1, compute capability 5.2, 4096MB, 3959MB available, 6060 GFLOPS peak) 2/7/2019 12:12:35 PM | | CUDA: NVIDIA GPU 2: GeForce GTX 980 Ti (driver version 418.81, CUDA version 10.1, compute capability 5.2, 4096MB, 3959MB available, 7271 GFLOPS peak) 2/7/2019 12:12:35 PM | | OpenCL: NVIDIA GPU 0: GeForce RTX 2080 (driver version 418.81, device version OpenCL 1.2 CUDA, 8192MB, 3551MB available, 10687 GFLOPS peak) 2/7/2019 12:12:35 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 980 Ti (driver version 418.81, device version OpenCL 1.2 CUDA, 6144MB, 3959MB available, 6060 GFLOPS peak) 2/7/2019 12:12:35 PM | | OpenCL: NVIDIA GPU 2: GeForce GTX 980 Ti (driver version 418.81, device version OpenCL 1.2 CUDA, 6144MB, 3959MB available, 7271 GFLOPS peak) 2/7/2019 12:12:35 PM | | Host name: Speed 2/7/2019 12:12:35 PM | | Processor: 16 GenuineIntel Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz [Family 6 Model 63 Stepping 2] 2/7/2019 12:12:35 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 dca pbe fsgsbase bmi1 smep bmi2 2/7/2019 12:12:35 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.18329.00) 2/7/2019 12:12:35 PM | | Memory: 63.89 GB physical, 73.39 GB virtual 2/7/2019 12:12:35 PM | | Disk: 80.00 GB total, 59.08 GB free 2/7/2019 12:12:35 PM | | Local time is UTC -5 hours 2/7/2019 12:12:35 PM | | No WSL found. 2/7/2019 12:12:35 PM | | VirtualBox version: 5.2.27 2/7/2019 12:12:35 PM | GPUGRID | Found app_config.xml 2/7/2019 12:12:35 PM | GPUGRID | Your app_config.xml file refers to an unknown application 'acemdbeta'. Known applications: 'acemdlong', 'acemdshort' 2/7/2019 12:12:35 PM | GPUGRID | Config: excluded GPU. Type: all. App: all. Device: 0 2/7/2019 12:12:35 PM | Einstein@Home | Config: excluded GPU. Type: all. App: all. Device: 0 2/7/2019 12:12:35 PM | Albert@Home | Config: excluded GPU. Type: all. App: all. Device: 0 2/7/2019 12:12:35 PM | | Config: event log limit 20000 lines 2/7/2019 12:12:35 PM | | Config: use all coprocessors | |
ID: 51446 | Rating: 0 | rate: / Reply Quote | |
However, this message appears to be related, and appears to show whenever a Short/Long task is available but not given to me. 2/7/2019 3:36:39 PM | GPUGRID | Tasks won't finish in time: BOINC runs 85.2% of the time; computation is enabled 95.1% of that | |
ID: 51447 | Rating: 0 | rate: / Reply Quote | |
After I changed my cache settings, from 10.0d and 0.5d, to 2.0d and 0.5d .... The message went away, and I started getting GPUGrid work for the first time on this PC since getting the RTX 2080. | |
ID: 51453 | Rating: 0 | rate: / Reply Quote | |
Good to see. I don't think I've ever seen that message about not finishing in time. | |
ID: 51454 | Rating: 0 | rate: / Reply Quote | |
2/7/2019 3:36:39 PM | GPUGRID | Tasks won't finish in time: BOINC runs 85.2% of the time; computation is enabled 95.1% of that Well, there is your explanation clearly set out above! You weren't being sent any work because Boinc knew that you wouldn't finish it in time. If it doesn't suit you to run Boinc 24/7 then it seems the only was forward is to drop the cache levels. Glad you got it sorted out. ____________ | |
ID: 51473 | Rating: 0 | rate: / Reply Quote | |
For now I have to remove the project_max_concurrent statement from cc_config and use the cpu limitation in Local Preferences to limit the number of cores to 16.Why not use this in your cc_config??? <ncpus>16</ncpus> | |
ID: 51474 | Rating: 0 | rate: / Reply Quote | |
We're getting very close to the completion of the max_concurrent fix. Richard, Not sure what you guys are up to but I sure hope you take the WCG MIP project into consideration before you roll it out. https://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=569786 They coded the use of the L3 Cache wrong and it uses 4-5 MB per MIP WU. If you exceed that I've BOINC performance cut in half. I have to use max_current in my WCG app_config or I cannot run MIP simulations. <app_config> <app> <name>mip1</name> <!-- needs 5 MB L3 cache per mip1 WU, use 5-10 --> <!-- Xeon E5-2699v4, L3 Cache = 55 MB --> <max_concurrent>10</max_concurrent> <fraction_done_exact>1</fraction_done_exact> </app> </app_config> | |
ID: 51475 | Rating: 0 | rate: / Reply Quote | |
We're getting very close to the completion of the max_concurrent fix. | |
ID: 51476 | Rating: 0 | rate: / Reply Quote | |
We're getting very close to the completion of the max_concurrent fix. Richard, Not sure what you guys are up to but I sure hope you take the WCG MIP project into consideration before you roll it out. https://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=569786 They coded the use of the L3 Cache wrong and it uses 4-5 MB per MIP WU. If you exceed that I've seen BOINC performance cut in half. I have to use max_current in my WCG app_config or I cannot run MIP simulations. <app_config> <app> <name>mip1</name> <!-- needs 5 MB L3 cache per mip1 WU, use 5-10 --> <!-- Xeon E5-2699v4, L3 Cache = 55 MB --> <max_concurrent>10</max_concurrent> <fraction_done_exact>1</fraction_done_exact> </app> </app_config> | |
ID: 51477 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : Advice for GPU placement