Author |
Message |
|
So a while back I upgraded my old 10-GPU system to newer hardware. It was running a Supermicro X9DRX+-F motherboard which provided 10x PCIe 3.0 x8 slots. It was great for PCIe connectivity and a great project to get it all working. but i wanted to update it to more modern hardware with faster and more power efficient hardware.
So i upgraded it to an AMD EPYC platform, but without getting into an incredibly expensive and proprietary server ecosystem, I would be stuck with only 7x PCIe slots as is with the standard ATX spec. So that's what I did.
Asrock Rack EPYCD8 motherboard
AMD EPYC (Rome) 7402P 24-core/48-thread CPU
64GB (4x16GB) DDR4 3200MHz Registered ECC
8x EVGA RTX 2070 GPUs
So how to put all 8 GPUs on only 7 slots? Bifurcation! This motherboard supports bifurcation natively on all four(4) x16 slots, but I would need a riser to actually split the slot into 2 slots to accept two GPUs. did some digging and came across a user C_Payne on [H]ardforum, and eventually found his webstore where he sells custome risers just for this purpose.
finally got the bifurcation card and got it all setup. working great and it all went as smoothly as I could have hoped for. just plug in the riser to the cards, hook up the 8-pin power, and change the slot settings in the BIOS to x8x8. the biggest issue was waiting 6 weeks for delivery from Germany (I'm in the US), likely delayed due to COVID restrictions or policies.
this system: https://www.gpugrid.net/show_host_detail.php?hostid=543446
pics: https://imgur.com/a/LrreKks
this is the riser board that I bought: https://peine-braun.net/shop/index.php?route=product/product&path=59&product_id=81
obviously this requires a motherboard that supports PCIe bifurcation with support exposed in the BIOS. you can't do this on most consumer motherboards, but it seems to be well supported on Asrock motherboards (even consumer) and some higher end HEDT boards. and very well supported on Server and Workstation motherboards. you could in theory use a riser board with a PLX chip to do the same thing without needing bifurcation support on the mb (C_Payne even sells some), but they are MUCH more expensive.
the advantage here is that it's cheaper and uses less power than a PLX based solution, and i still get PCIe 3.0 x8 bandwidth to each GPU, so no slowdown with GPUGRID. i think i should be able to get all the way to 11 GPUs if i wanted to. but it'll likely stay at 8 for the foreseeable future due to power constraints at this location.
____________
|
|
|
rod4x4Send message
Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level
Scientific publications
|
Impressive!
Gives new meaning to gathering around the fire (GPUs) at christmas...
My PC room is 90 degrees (F) at christmas (summer). What room temps do you see ambient? |
|
|
|
it varies. i usually leave the window to the room open a bit to let some fresh air in, and its below freezing at night now. i dont think the room ever gets over 75-80F, with the window closed. and much much cooler with it open. probably down to the 50s or 60sF.
the GPUs stay around 50-60C though while running GPUGRID
I have all GPUs power limited to 150W each, with an overclock to offset most of what i lost from power limiting. the whole system pulls 1500-1600W running full tilt.
____________
|
|
|
|
Asrock Rack EPYCD8 motherboard
I was ogling your EPYC machine the other day and was curious if you had the Supermicro rack setup. I see you found something better. Asrock makes heavy-duty stuff and I've come to respect them even more than Asus from what I've read and heard from the geeks in the local computer club. |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1288 Credit: 5,113,906,959 RAC: 9,251,119 Level
Scientific publications
|
Supermicro has very little ATX sized motherboards. Mostly custom footprint solutions for 1U or 2U servers.
Asrock Rack OTOH seems to specialize in common motherboard form factors that fit in standard PC cases very easily. |
|
|
|
when I was running the supermicro board. the only thing I cared about at the time was PCIe connectivity.
I wasn't too interested in CPU processing as I wasn't using it. but now, since I've started doing Universe@home on the CPU, i wanted something more capable and power efficient.
the platform change moved to a much more power efficient CPU (and much better IPC, more cores), more power efficient and faster RAM, and the motherboard likely uses less power also.
the only thing that's not better is my wallet lol.
____________
|
|
|
|
Supermicro has very little ATX sized motherboards. Mostly custom footprint solutions for 1U or 2U servers.
Asrock Rack OTOH seems to specialize in common motherboard form factors that fit in standard PC cases very easily.
SM has a ton of ATX and EATX motherboards, but they do have a good amount of proprietary boards too to fit their custom server chassis. the old 11-slot SM board I had was a custom board, that I custom mounted to the mining frame, but I went with it for the fact that it had 11 slots. you can't get an 11-slot board in any normal form factor since it exceeds the normal ATX spec of only 7 slots.
I picked the Asrock Rack board more for the fact that I could get it with all 7 slots and an external PCIe power input. none of the SM boards had that. SM doesnt seem to have external PCIe power input on any of their boards, where that seems to be a common feature on the Asrock boards. Very important if you're going to have a lot of cards pulling power from the MB like this.
____________
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1288 Credit: 5,113,906,959 RAC: 9,251,119 Level
Scientific publications
|
When I've looked at the Supermicro mobo page and used the filters, it produced very little ATX or EATX form factor boards. I spend quite a bit of time in fact googling and searching on the SM site. Not many results.
But then I know I have terrible insight into keyword searches that produce nothing for me typically. |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1288 Credit: 5,113,906,959 RAC: 9,251,119 Level
Scientific publications
|
SM doesnt seem to have external PCIe power input on any of their boards, where that seems to be a common feature on the Asrock boards. Very important if you're going to have a lot of cards pulling power from the MB like this.
Why I like the common Asrock brand also which seems to have an external PCIE slot input power connector almost guaranteed.
I think that is very uncommon for consumer boards from the mobo brands. |
|
|
rod4x4Send message
Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level
Scientific publications
|
I was interested to see how well the Bifurcation performed. So below is a capture of tasks from the system highlighted in this thread: https://www.gpugrid.net/show_host_detail.php?hostid=543446
Tasks are retained by Gpugrid statistics page for 7 days. All GPUs have 7 days of tasks.
All tasks are MDAD. (ADRIA and GERARD tasks have been exhausted for a while)
The results show the performance is excellent!
Boinc No. Ttl Ttl Average Average
Device Tasks Runtime Credit Credit Runtime
---------------------------------------------------------
0 318 601069 4867705 699,703 1,890
1 330 602667 4918527 705,134 1,826
2 307 605969 4780381 681,594 1,974
3 305 601662 4702621 675,307 1,973
4 312 606045 4847006 691,007 1,942
5 315 598183 4720060 681,753 1,899
6 292 599571 4559550 657,045 2,053
7 308 598492 4649381 671,198 1,943
---------------------------------------------------------
The small anomaly with device 6 (I assume is on the riser) is so small it could be attributed to normal Work Unit variation, GPU silicon lottery or the Bifurcation.
The data ably demonstrates that employing Bifurcation works, and works well! |
|
|
|
check the pics. every single GPU is on a riser. and no GPU has less than PCIe 3.0 x8 bandwidth.
three (3) GPUs on a 3.0x16 link
five (5) GPUs on a 3.0x8 link
only 2 GPUs are running on a bifurcated slot. one single PCIe 3.0 x16 slot split into two (2) PCIe 3.0 x8 slots. they are sharing the slot but no bandwidth goes to waste. all other GPUs are single strung to its own slot on the motherboard via riser ribbon cable.
I'll have to look closer at which cards are where, because the way the nvidia driver enumerates the cards isnt the same as the way BOINC enumerates them. especially hard to figure out which is which when they are all identical like this.
it could be down to WU variation, as well as clock speed variation. I have them all overclocked with the same offsets, but some run hotter than others or have poorer silicon quality and hence some ruin at lower clock speeds than others. i's just easier for me to apply the same exact settings to every single card rather than trying to tweak and fine tune each one.
also keep in mind, if you try to compare these cards to other RTX 2070, I have these cards all power limited to 150W, down from the 175W stock.
____________
|
|
|
rod4x4Send message
Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level
Scientific publications
|
Yes, I have studies the pics.
I should have been more specific, my mistake. I should have said the "Riser connected to the Bifurcated slot"
To match your GPUs to the Boinc Device, try this:
nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version --format=csv
This will output your devices attached and their Bus ID.
slot designation can be gleaned from
dmidecode -t slot
The Bus ID and designation can then be matched to what is in you coproc_info.xml file and then will cross reference to the Boinc Device number.
Another tool to help with identifying the PCIe slots is:
lspci | grep VGA |
|
|
rod4x4Send message
Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level
Scientific publications
|
Yes, I have studies the pics.
I should have been more specific, my mistake. I should have said the "Riser connected to the Bifurcated slot"
To match your GPUs to the Boinc Device, try this:
nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version --format=csv
This will output your devices attached and their Bus ID.
slot designation can be gleaned from
dmidecode -t slot
The Bus ID and designation can then be matched to what is in you coproc_info.xml file and then will cross reference to the Boinc Device number.
Another tool to help with identifying the PCIe slots is:
lspci | grep VGA
dmidecode may not work on the EPYC BIOS. (doesn't work too well on my X470 motherboard) lspci | grep VGA probably will work fine.
The first number in the output will be slot number in Hex.(starting at PCIe1 on the motherboard)
Match this to bus ID in coproc_info.xml The Bus ID is in Decimal.
For example, slot 1 on my system is 27 (hex, commonly denoted as 0x27), this corresponds to Bus ID 39 in coproc_info.xml
|
|
|
|
According to the busID and then matching busID to the physical slots by manipulating fan speeds, the 2 GPUs on the bifurcated slot should be BOINC device 3 and 4.
____________
|
|
|
rod4x4Send message
Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level
Scientific publications
|
According to the busID and then matching busID to the physical slots by manipulating fan speeds, the 2 GPUs on the bifurcated slot should be BOINC device 3 and 4.
At a guess, Asrock seem to have ordered their PCIe slots priortising the bus order with the x16 slots first then the x8 slots:
PCIe1 - dev 0
PCIe3 - dev 1
PCIe5 - dev 2
PCIe7 - dev 3 & 4
PCIe2 - dev 5
PCIe4 - dev 6
PCIe6 - dev 7
Does this line up with what you have seen? |
|
|
|
According to the busID and then matching busID to the physical slots by manipulating fan speeds, the 2 GPUs on the bifurcated slot should be BOINC device 3 and 4.
At a guess, Asrock seem to have ordered their PCIe slots priortising the bus order with the x16 slots first then the x8 slots:
PCIe1 - dev 0
PCIe3 - dev 1
PCIe5 - dev 2
PCIe7 - dev 3 & 4
PCIe2 - dev 5
PCIe4 - dev 6
PCIe6 - dev 7
Does this line up with what you have seen?
nope, not at all. they're all over the place.
first, according to the BIOS and silkscreen printed on the PCB, the slots are ordered 1-7 starting from the bottom of the board, so the slot furthest from the CPU is slot 1, and the one closest to the CPU is slot 7.
then when all slots are populated, the slot that drives the monitor (and hence busID 1) is slot 5, the second x16 slot from the CPU slot.
I haven't checked them out in full detail, I only really looked at which cards were on the bifurcated slot yesterday.
---------
CPU
---------
PCIe7 - dev 3&4
PCIe6 - dev 1
PCIe5 - dev 0
PCIe4
PCIe3
PCIe2
PCIe1
I can change the device order if I manually edit the xorg.conf file but there's really no point since all the cards are identical. i don't want to manually edit the card that drives the monitor to be different than what the BIOS uses for ease of use. if I change it, then the OS and the BIOS will be trying to use different monitors and that's just confusing.
my old supermicro boards were pretty similar to this in behavior with bottom-to-top PCB ordering, and random mid-slot being the prime display slot. but on both boards, if you only have 1 GPU on the board, the display will drive from whatever slot it's plugged in to.
____________
|
|
|
rod4x4Send message
Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level
Scientific publications
|
nope, not at all. they're all over the place.
Thanks for the update. Interesting that mid slots are picked as the active display on both your server motherboards. Apparently more factors to consider than just a numbering schema. |
|
|
SkillzSend message
Joined: 6 Jun 17 Posts: 4 Credit: 7,472,647,979 RAC: 24,691,654 Level
Scientific publications
|
Really interested in this setup.
What ribbon risers are you using?
How can you check if your motherboard/BIOS supports Bifurcation? While I could easily just get the same board you have, but I am trying to see how cheap I can build one of these using other platforms such as the Intel x99 platform since those boards can be had for much cheaper and CPUs are much cheaper than anything EPYC as well. I don't care about high core count on the CPU, just as long as it's got at least 8 cores so it can keep the GPUs busy. |
|
|
|
Really interested in this setup.
What ribbon risers are you using?
How can you check if your motherboard/BIOS supports Bifurcation? While I could easily just get the same board you have, but I am trying to see how cheap I can build one of these using other platforms such as the Intel x99 platform since those boards can be had for much cheaper and CPUs are much cheaper than anything EPYC as well. I don't care about high core count on the CPU, just as long as it's got at least 8 cores so it can keep the GPUs busy.
if X99, I would shoot for an 8-core/16-thread part. 8-threads would be a little tight trying for 8+ GPUs. I always try to leave 1-2 CPU threads doing nothing to ensure there are no issues with CPU resources (nvidia GPU apps use 1 CPU thread per GPU task).
X99 does support bifurcation on certain boards. but support is inconsistent and spotty. I think Asrock has given the most support for it. You can check if the board supports Bifurcation by going into the BIOS and looking for the Bifurcation setting. if it's not there, then it's not supported. this isnt a feature that is popular enough to be advertised. if you don't have the board, you'll have to reach out to the manufacturer or get confirmation from someone else who has one.
I'm using risers from Amazon under the brand name "EZDIY-FAB", but they don't appear to be as widely available anymore. but any PCIe 3.0 x16 rated riser should work well. just search amazon or ebay for them, there are tons.
____________
|
|
|
SkillzSend message
Joined: 6 Jun 17 Posts: 4 Credit: 7,472,647,979 RAC: 24,691,654 Level
Scientific publications
|
I said 8 cores, not 8 threads. Thanks.
Really wish their was an easier way to determine if the board/bios supported bifurcation or not. Think I may just end up getting the same board you have, but going with the 7251 8-core EPYC CPU to help keep the costs low.
Thanks for replying. |
|
|