info02dcf28f6d4's blog

Posted Mon 25 January 2021

Build Your Own Data Science Supercomputer

Have you ever wanted to build your own supercomputer? Yes?!
Why wouldn’t you!

If you want to break records and blaze through 100’s of millions of data rows in a way that was previously impossible, consider DIY.

There’s something exciting about bare-metal hardware, and with your own hands, putting together a data Ferrari that breaks records and does things that were impossible a year ago. Also, if you were ever a PC gamer, then this is easy, because the heart of Data Science 2.0 is GPU & VRAM and builds similar to gamer rigs.

If you just want a ready-made machine, you could easily just go to the CyberPowerPC website and get a great build. But, for the adventurous, read on!

Ok, lets get right to the details of the build:

Component Price (USD)
GPU - EVGA GeForce RTX 3090, 24GB GDDR6X 2235
CPU - AMD Ryzen 9 3950X 16-Core 719
SSD - WD_Black 1TB SN850 NVMe230 230
RAM -Corsair Vengeance LPX 64GB 350
CPU Cooler - Noctua NH-D15 248
Case: Lian Li Lancool II Mesh 133
Fans 3 Noctua NF-P12 (1700 PWM) 130

Overall, the build is just over $4K…Buying this build pre-made will likely be around double the price… anywhere from $8-10K.

You can get a great computer to run Row64 for $1.5K, but if you want to break records and push the limits of data, it’s worth considering DIY.

Another factor is the price of GPU. The GPU is the heart of next generation Data Science performance. So most of the money is going there.

If you consider the 3090 has an MSRP of $1.4K, then when prices come back down after this current cryptocurrency price surge, it will be about $800 less (than the above listing) to break records, especially when cheaper cards come out next year.

For a few of us, this was the first time doing a custom computer build - at Row64 we’re mostly data scientists and coders.

So this was a bit of an adventure… First, we got out the motherboard and attached the CPU, the RAM & SSD, and then the CPU Cooler.

Overall it went pretty well, but we had to watch a few YouTube videos to make sure we were putting in the CPU and the Cooler in correctly. This was the intense exciting part of the whole process.

Next, plugged in the GPU and wired the motherboard to the power and case externally just to make sure everything was working.

It was an awesome moment when it first turned on!

On a side note, we should probably talk about the case… big picture our goal was to make the fastest build possible, but not to use anything exotic or labor intensive. So we took overclocking and water cooling off the list.

That meant we had to come up with the best possible solution for air cooling. Actually, we got quite obsessed on this topic and spent several months researching it. What we arrived on turned out incredibly well - the “Lian Li Lancool II Mesh Performance”.

I can’t say enough how awesome this case is. It’s really great, because you can open it up from 4 different angles with 4 different hinges, and is just perfectly designed for incredible cooling and build flexibility. In fact, we brought our build over to our friends and Colorado neighbors at LunarG for a company field trip. These folks in Fort Collins are all incredible hardware engineers, who also are great GPU coders. When we brought out the build and were chatting about our Data Science Supercomputer, they liked all the ideas, but 95% of the conversation revolved around this case and how cool it was.

So, we pushed the cooling a step further than the case default. If you get the Lancool II Mesh Performance it comes with 3 fans. 2 in the front and 1 in the back. We bought an additional 3 Noctua fans and did lots of research on how to get optimal airflow for cooling. Row64 really pushes the GPUs, and we knew we needed great cooling if we wanted to break records.

So you can see our airflow setup is designed to draw air from the front and base of the case and push it out the back and top.

In all our tests afterwards, this has worked perfectly and is a highly effective approach to cooling a cutting edge GPU and CPU build.

Finally, we got all the parts together. Now it’s time for the Data Science Supercomputer moment of truth… Turning it on!

It worked!

This is a great build for data super computing. At Row64, we’ve recently sorted over a billion records using this exact build and are pushing the frontier even farther with new coding techniques.

Let’s take a quick look at what this all means compared to some of the other cards (actually CyberpowerPC & ASUS builds) we have in the office:

So there it is… A Data Science Supercomputer.

Over 53% faster the our 2080Ti, and it can sort over a billion records in 60 seconds (details coming soon).

And running Geo Ray-Tracing is beyond mind blowing on this machine - but we’ll explain more of that soon enough - it’s for an upcoming blog post…

Category: Data Science, DIY, Hardware