Last week’s big news -NVIDIA acquiring Mellanox – had many people wondered “how does that make sense”? What were possible reasons for this acquisition, and what part of our industry will it impact the most?
What Does NVIDIA Do?
NVIDIA started out making graphics cards for for video games. People started using those cards to power to make VDI, especially the graphics, as usable as desktops.
But what is interesting in the spectrum of the NVIDIA acquiring Mellanox is that NVIDIA realized GPUs can perform the complex mathematical algorithms that Big Data/HPC/ML/DL and ultimately AI applications require.
What Does Mellanox Do?
Mellanox is also a hardware company, building Ethernet and InfiniBand intelligent interconnect solutions. InfinBand is what is interesting when we talk about NVIDIA acquiring Mellanox. InfiniBand is a networking standard that can run RDMA (Remote Direct Memory Access). RDMA allows components to bypass a server’s bus a communicate with each other directly. According to Tech Target:
InfiniBand is a type of communications link for data flow between processors and I/O devices that offers throughput of up to 2.5 gigabytes per second and support for up to 64,000 addressable devices. Because it is also scalable and supports quality of service (QoS) and failover, InfiniBand is often used as a server connect in high-performance computing (HPC) environments (emphasis mine).
Mellanox hardware supports InfiniBand as well as a standard called RoCE (RDMA over Converged Ethernet). Both standards can run RDMA. This article gives a much deeper technical explanation of the different technologies.
NVIDIA Acquires Mellanox – But Will It Blend?
We have one hardware vendor (NVIDIA) that has powerful components being used to fuel HPC, ML, DL and AI architectures. And we have a second hardware vendor that provides networking hardware that helps server components by-pass traditional server bus architecture. That sounds like a pretty good combination to me.
VMware is able to virtualize these components for these new workloads (full disclosure: I’m a VMware employee). In fact, Tony Foster and I presented this topic at VMworld last year, and he was kind enough to blog about it and posts our slides. Our presentation is a great 101 AI/HPC/DL architecture primer, but if you want to dive into the details of how virtualized HPC can be made even faster with NVIDIA GPUDirect RDMA (using Mellanox devices) check out my teammate Mohan Potheri’s blog posts series [post 1] [post 2]. It’s a pretty deep dive with lots of explanation of how this works together. It paints a pretty compelling picture of one reason for NVIDIA had to acquire Mellanox.
What Happens Next?
I’m sure there is more to come with this story. This week NVIDIA’s customer conference, GTC is happening. Mellanox was part of one of the keynotes. It will be interesting to see what their product teams can do now that they are all under one umbrella.
Do you have any predictions? Let us know in the comments!