Build Your Own Data Science Supercomputer

Have you ever wanted to build your own supercomputer? Yes?!Why wouldn't you!

If you want to break records and blaze through 100's of millions of data rows in a way that was previously impossible, consider DIY.

There's something exciting about bare-metal hardware, and with your own hands, putting together a data Ferrari that breaks records and does things that were impossible a year ago. Also, if you were ever a PC gamer, then this is easy because the heart of Data Science 2.0 is GPU & VRAM and builds similar to gamer rigs.

If you just want a ready-made machine, you could easily just go to the CyberPowerPC website and get a great build. But, for the adventurous, read on!

Ok, let's get right to the details of the build:

Product components table

Overall, the build is just over $4K…Buying this build pre-made will likely be around double the price… anywhere from $8-10K.

You can get a great computer to run Row64 for $1.5K, but if you want to break records and push the limits of data, it's worth considering DIY.

Another factor is the price of GPU. The GPU is the heart of next-generation Data Science performance. So most of the money is going there.

If you consider the 3090 has an MSRP of $1.4K, then when prices come back down after this current cryptocurrency price surge, it will be about $800 less (than the above listing) to break records, especially when cheaper cards come out next year.

For a few of us, this was the first time doing a custom computer build - at Row64 we're mostly data scientists and coders.

So this was a bit of an adventure… First, we got out the motherboard and attached the CPU, the RAM & SSD, and then the CPU Cooler.

Overall it went pretty well, but we had to watch a few YouTube videos to make sure we were putting in the CPU and the Cooler in correctly. This was the intense exciting part of the whole process.

Next, plugged in the GPU and wired the motherboard to the power and case externally just to make sure everything was working.

It was an awesome moment when it first turned on!

On a side note, we should probably talk about the case… the big picture our goal was to make the fastest build possible, but not to use anything exotic or labor-intensive. So we took overclocking and water cooling off the list.

That meant we had to come up with the best possible solution for air cooling. Actually, we got quite obsessed with this topic and spent several months researching it. What we arrived at turned out incredibly well - the "Lian Li Lancool II Mesh Performance".

I can't say enough how awesome this case is. It's really great, because you can open it up from 4 different angles with 4 different hinges, and is just perfectly designed for incredible cooling and build flexibility. In fact, we brought our build over to our friends and Colorado neighbors at LunarG for a company field trip. These folks in Fort Collins are all incredible hardware engineers, who also are great GPU coders. When we brought out the build and were chatting about our Data Science Supercomputer, they liked all the ideas, but 95% of the conversation revolved around this case and how cool it was.

So, we pushed the cooling a step further than the case default. If you get the Lancool II Mesh Performance it comes with 3 fans. 2 in the front and 1 in the back. We bought an additional 3 Noctua fans and did lots of research on how to get optimal airflow for cooling. Row64 really pushes the GPUs, and we knew we needed great cooling if we wanted to break records.

Airflow Setup 01

As you can see our airflow setup is designed to draw air from the front and base of the case and push it out the back and top.

In all our tests afterward, this has worked perfectly and is a highly effective approach to cooling a cutting-edge GPU and CPU build.

Finally, we got all the parts together. Now it's time for the Data Science Supercomputer moment of truth… Turning it on!

It worked!

This is a great build for data supercomputing. At Row64, we've recently sorted over a billion records using this exact build and are pushing the frontier even farther with new coding techniques.

Let's take a quick look at what this all means compared to some of the other cards (actually CyberpowerPC & ASUS builds) we have in the office:

So there it is… A Data Science Supercomputer.

Over 53% faster than our 2080Ti, and it can sort over a billion records in 60 seconds (details coming soon).

And running Geo Ray-Tracing is beyond mind-blowing on this machine - but we'll explain more of that soon enough - it's for an upcoming blog post…

Published: Jan 25, 2021 5:00pm UTC

Recommended Reading