Often Row64 we get asked about many of the terms and technologies we are pioneering. After all, many are curious how exactly we deliver roughly 10x the speed of Microsoft Excel on personal computers with exact identical specs.
While the full answers to these questions are best found in our whitepaper, the short answer for questions surrounding our revolutionary ‘GPU Spreadsheets’ can be found in our comprehensive guide below.
In simple terms a GPU spreadsheet is a spreadsheet whose software is ‘GPU-enabled’ to take advantage of the graphics processing unit (GPU) of a computer. This is in contrast to most spreadsheet software that use the computer’s CPU for their calculations, as is standard for most applications. The main benefit of a GPU spreadsheet is improved speed and scale of calculations.
The reason why a GPU spreadsheet is so powerful is because a GPU is a processor very well suited for the complex numeric calculations required in spreadsheets. This is a function of the highly parallelized architecture of the GPU. Also known as parallel processing, this type of computing refers to a process by which similar calculations are performed across one data stream at the same time, (i.e. in parallel). You can even see this physically in the architecture of a GPU which has many more cores than a CPU all lined up in parallel rows.
The nature of parallel processing makes GPUs very fast at very specialized tasks. This includes graphics rendering, but also the types of formulas and display required for spreadsheets and corresponding data visualizations—hence the invention of the ‘GPU spreadsheet’.
When you hear about GPU spreadsheets, you’ll often also hear about “GPU calculations”, often referred to as “GPU compute”. To break it down, GPU-enabled calculations or GPU compute are non-graphics calculations that are either wholly or partially routed to a computer’s GPU by software which has coded directly to the GPU, allowing it to understand and process the necessary data.
To better understand the nature of GPU compute, it’s useful to remember that historically GPUs were just designed for graphics processing; it’s literally in the name: graphics processing unit. Rendering complex pixel maps in real time for video games or video software was exactly the type of task for which highly parallelized processing was fantastic at. Through parallel processing, GPUs would “load” up one set of instructions enabling them to perform roughly 60-200 calculations a second from the same set of coordinate “instructions”, thus displaying thousands of pixels in a cohesive, real-time 3D environment.
In recent years, as data sets have grown astronomically larger, data scientists and other businesses have increasingly looked to harness the computational strength of GPUs not just for graphics rendering, but for calculations as well. Uses of GPU compute have included data science, bitcoin mining, genomic sequencing, options market pricing, machine deep learning and climate modeling.
Putting the two concepts together, you’ll see that GPU spreadsheets are applying GPU compute in the context of a spreadsheet user interface.
The unfortunate reality is that enabling your GPU isn’t as simple as flipping a button. In order for any function to be allocated to the GPU it needs to actually be coded in its own language, which is optimized to the highly parallelized nature of GPUs. This is because CPUs and GPUs are so different in architecture that it isn’t possible to create language that speaks to both.
In addition, GPU languages aren’t (at least at the moment) commonly known. They’re also somewhat time consuming as they were historically designed for graphics and thus you had to sort of “trick” your program graphics processor into thinking of data like graphics. To make matters worse, both Nvidia and AMD, the two main graphics processor manufacturers, have their own proprietary GPU language—CUDA and StreamSDK/Brook+ respectively. Apple has also introduced Apple Metal as a GPU language as they’ve begun adding their own GPUs into their products.
There are some attempts at using open source language such as OpenCL to ‘hack’ Excel into doing calculations through the GPU. However while these do show substantial speed improvements, they definitely don’t have the robustness or user friendly design more analysts have come to expect from their software.
Ultimately for GPU-enabled calculations, one would need to have specific knowledge of GPU programming. This is challenging for even highly skilled programmers. There exists third party software that enables some translation from traditional programs to executable GPU files that do the computing, however these need to be written manually for each function.
At Row64 we take pride in being one of the only companies out there making user friendly GPU spreadsheets that give everyday analysts the power of GPU calculations without needing to learn to code directly themselves.
In general GPU enabled calculations are estimated to be in the realm of 10-100x faster than CPU calculations for similar tasks. However these speeds differ based on the task being assigned, as GPU calculations are not just used for spreadsheets but often for machine learning.
A couple speed benchmarks of the GPU calculations:
A test using a benchmark CIFAR-10 Object Recognition Model (a popular deep learning test for recognizing visual patterns) showed a 27x speed increase when using the GPU rather than simply the CPU.
McKinsey & Co reported that cycle times for machine learning can be 50 times faster when using GPU acceleration.
When using identical computers, Row64 was able to get 17x faster sorting functionality and 91x faster our “hybrid engine”, which optimizes both CPU and GPU processing.
Despite the practicality of using GPUs for spreadsheets, the reality is that virtually all examples to date have been “hacks” from people using GPU language to port Excel data through external code. This has primarily been done through OpenCL, which stands for Open Code Language and it represents the first and mode widely used open source GPU language.
The team at Stream High Performance Computing put together an Excel to OpenCL integration that essentially ports Excel Data to an external DLL (which is executable machine code file), thus outsourcing the calculations to the GPU. While an impressive feat, it’s far from perfect as it still doesn’t work with Nvidia’s processors, and by the team’s own admission, has precision problems between different sets of integers.
A team at Prog.World did a similar hack, using OpenCL and a DLL file to achieve very high speed improvements with their Excel VBA functions. While very impressive on a technical level, this project is one that requires a high degree of coding experience to execute, as it actually requires the user to code the entire process.
To fill this void between powerful GPU computing and user-friendly, intuitive software, we at Row64 wrote our software from the ground up to take advantage of both the CPU and GPU. Our “hybrid engine” not only blows calculation and display speeds out of the water compared to everything else, but we also do so with an incredibly familiar and easy to use spreadsheet interface. No needing to learn how to code to the GPU yourself.
A Python spreadsheet is any spreadsheet for which the data manipulation is executed with Python script. A Python spreadsheet is different from a GPU spreadsheet in that while a GPU spreadsheet refers to the hardware enabled calculations a spreadsheet uses, a Python spreadsheet refers to the use of Python code in the software itself. Because they refer to different aspects, it’s possible for a spreadsheet to be both a Python and GPU spreadsheet (such as is the case with Row64). This can give the full advantage of speed, scale and flexibility.
Because of its flexibility and efficiency working with large datasets, Python has increasingly been making its way into the world of data science and spreadsheets. A chart of Stack Overflow shows Python to be the fastest growing data science language by a large margin. Some current options like openpyxl allow you to run python script directly into Microsoft Excel. Others, like open source pyspread are natively written in Python entirely.
Our own Row64 takes spreadsheet-style formulas and expresses them as Python code, which a user can then save, modify or even share with other users. This flexibility means entire workflows or custom functions can be programmed in Python, and then exported and loaded into any other user’s spreadsheet, without needing to replicate code. To make matters even simpler, we’ve coded over 250 of these Python-coded “data science recipes” for users to load in one click, to do some of the most commonly asked functions like time series forecasting or bar chart visualization.
If you want to see the power of Python and GPU Spreadsheets in action, download our free 30-day trial and download our sample files of up to 100 million rows to see how blazing fast our speeds really are.