GPU Server for HPC Cluster

How hard it is to build a server with four top-of-the-line GPU for an high-performance computing cluster? Harder than you might think.

When I started building the SCRP cluster back in 2020 summer, the GPU servers were provided by Asrock Rack. Everything except the GPUs were preassembled. This is the sensible thing to do in normal times.

Fast forward to 2021 summer, and times were not normal. The supply chain distribution and semiconductor shortage were in high gear. Pretty much every name-brand server manufacturer quoted us months-long lead time, if they were willing to deal with us at all. To get everything in for the new academic year, I constructed a series of servers with parts sourced from different parts of the world. It is actually not that hard to build servers—they are basically heavy-duty PC’s with all sorts of specialized parts—that is, unless you want a GPU server suitable for an HPC cluster.

So what is so special about GPU servers for HPC cluster?

  • Most server case have seven to eight PCI slots, but I needed at least nine slots (four dual-slot GPU + single slot Infiniband network card). There are maybe two manufacturers for such cases you can find from retail channel.
  • High-end GPU uses a lot of power. A single RTX3090 uses 350W, four means 1400W. Adding in CPU and other stuff and you are looking at 1800W minimum. A beefy power supply is definitely needed.
  • 1800W ATX power supply does exist, you say. The problem is, almost no servers use ATX power supply—they pretty much all use specialized CRPS power supply that gives you two power supplies in one small package. There are a lot of benefit to this, including redundancy and lower load per power supply. Guess how many 2000W CRPS power supply can you find from retail channel? ZERO. There is simply too much demand for these things from server manufacturers and too little from retail. I was fortunate enough to have one specially ordered on my behalf by a retail supplier, but it took a while to arrive.
  • Once you sorted out the parts, now comes assembly. Unless you have one of those highly-specialized Supermicro 11-slot motherboard—I am not sure if they even sell them in retail—your motherboard will have the width of seven PCIe slots. But you need nine! What do you do? Simple, you might think, all that is needed is a PCIe extension cable. Except you need one end of the cable to go under a GPU, and 99% of the cables you can buy will not be able to do that. I ended up having one custom-made. Yes, custom made. It’s the silver strip in the photo. Did I mention it is so fragile out of the factory, I ended up strengthening it with hot glue myself?

To conclude, if you think building your own PC is challenging, building a GPU server for an HPC cluster is probably three times the challenge. Another reason why you should not maintain your own infrastructure.

PCIe Gen 4 GPU does not play nice with Gen 3 extender board

Spent over an hour trying to figure out why some new GPUs were not working. The server is concern is a Asrock Rack 2U4G-EPYC-2T, which is a specialized server that allows four GPUs to be installed in a relatively small case. Google was not helpful because, understandably, this is a niche product produced only in small quantities.

What did not work:

  • -Attaching four Ampere GPUs (i.e. RTX 3000 series) in their intended positions in the case.

What work:

  • Attaching four Pascal GPUs (i.e. GTX 1000 series) in the intended positions.
  • Attaching only one Ampere GPU at the rear of the case.
  • Attaching four Ampere GPUs directly to the mainboard.

Took me a good hour to figure out that the issue was caused by the PCIe extender board. The three GPU positions at the front require the extender board, but the board was only for PCIe Gen 3. Normally, Gen 4 GPUs can negotiate with Gen 3 mainboards to communicate in PCIe Gen 3, but apparently they cannot do that through the extender board. Once the issue had been identified, the solution was actually very straightforward—manually setting the PCIe lanes to Gen 3 solves everything.

Yet another reason why maintaining your own computing infrastructure is not for the faint hearted.


今屆諾貝爾經濟學獎由加洲大學柏克萊分校的大衛卡德(David Card)獲得一半獎金。以「泰斗」來形容卡德在柏克萊經濟系的地位絕不為過。柏克萊經濟系每年有近三分一的博士生由卡德指導。因為門生眾多,學生要見卡德的話好幾星期前就要登記。學系的最佳導師獎亦多次由卡德奪得,以至學系其他教授有一年半開完笑的說,要把獎項命名為「卡德獎」,以免他再次獲獎。
又:卡德該篇最低工資研究的合著者,普林斯頓大學經濟學教授克魯格(Alan Krueger)不幸在2019離世。行內普遍認同若克魯格仍在生,他幾無疑問會和卡德分享今年的獎項。


We will be running tests and benchmarks here at CUHK SCRP over the next few days. Users should be able to access the new RTX 3090 through Slurm after the scheduled maintenance next week.

A series of talks in the coming week on how to use the Department’s new HPC cluster.

“Students’ grandparents kept dying in my 8am class until I moved it to 3pm. I saved lives.”Perhaps it’s time to negotiate with the department regarding my morning classes…

SCRP: 兩個月建成的「超級電腦」

這兩天學系高年級的同學應該都收到我發出有關學系全新網上系統的電郵。新系統由多台伺服器一體運作,配以多種統計軟件讓同學網上使用。這樣的系統一般被稱為高效能運算集群(High Performance Computing Cluster),不過大家較熟悉的名字可能是其俗稱「超級電腦」(Supercomputer)。建立這個系統的原由是因疫情關係學系的電腦室全關,如何讓幾百名學生在家用到統計軟件就成了必須解決的問題。

六月中在學系的支持下,東找西找籌集了二十萬的預算,以兩個月時間建起了SCRP這個新系統。 20萬對一個學系來說是不少的,但在高效能運算很多時候一台機都未必買得到。為了節省預算,SCRP用了相當多的二手零件。尤幸高性能運算零件的二手市場供過於求,不難以五分一甚至十分一的價錢找到合用的零件。再加上借調學系較為早期的伺服器,最終在8月中完成整套系統。

有了這個新集群,中大經濟系很大機會會是第一個經濟系要求所有同學都學用高性能運算系統(很邪惡的老師)。 雖然各間大學都有自己的高性能運算集群, 但通常都只供研究人員使用,在計算機科學系以外甚少會讓本科生都可以使用。其實高效能運算集群的基本使用並不是十分複雜,像R和Python甚至直接用瀏覽器就可以了。 雖然老是被老師逼學新事物有點可憐,但還是那句,今時今日學多點數據分析總有好處。