You are currently viewing The Engine of AI: Why Supercomputers and GPUs Rule Machine Learning

The Engine of AI: Why Supercomputers and GPUs Rule Machine Learning

How did graphical processing units, or GPUs, which were originally built as hardware for video gaming, become the most popular and ubiquitous infrastructure used in generative AI today? And why are traditional CPU-based data centers not repurposable for AI in all cases? And does everyone need expensive hardware to build and run AI systems?

The breakthrough behind the rise of generative AI is often credited to software innovations, like new AI algorithms that came out with the transformer model architecture. But hardware breakthroughs were equally important, such as new kinds of chips and better computer chips, which have allowed researchers to train models at a massive scale to make them as knowledgeable as we know them to be today.

For example, have you ever opened an Excel file in your laptop with thousands of rows of data and your laptop crashed? Then you know the pain of hardware limitations. And in AI, and in particular in building LLMs, hardware limitations happen just the same, except at a vastly larger scale. Think of datasets and calculations that are big enough to crash tens of thousands of laptops worth of compute. And that’s where GPUs come in. So let’s break down why these types of chips are so important in making generative AI possible.

How Computer Chips Fundamentally Work

To understand why GPUs process AI tasks faster, let’s start with how these computer chips fundamentally work. So chips like GPUs or CPUs have tens of billions of tiny electrical switches called transistors. Different groups of these switches have different functions:

  • Compute: Handle math computations.
  • Cache: Store the working data and the instructions for those computations (think of it as the short-term memory).
  • Control: Decode instructions and plan and coordinate which steps happen in which order for these calculations (really controls the logic).
  • Memory: Store input data for the calculations (think of that as longer-term memory).

Why CPUs Struggle with AI

CPUs are designed to solve a wide variety of tasks quickly. Why? That’s because CPUs are what we use in our personal computers and data centers where they need to be general purpose to switch between different things like web services or databases or analytics. So when it comes to their architecture:

  • Compute (Low): Relatively speaking, they have less emphasis on mathematical operations.
  • Cache (Moderate/Medium): They do need short-term memory to perform the calculations that they do.
  • Control (High): They have to do more instruction following, more handling of branch logic that varies across tasks, and more scheduling of which operations get done first.
  • Memory (Lower): CPUs normally just borrow the memory from the rest of the computer; they normally don’t have dedicated memory.

GPUs: From Video Games to Generative AI

GPUs, on the other hand, are meant to process a high number of similar computations at once or in parallel.

  • Compute (High): This means a high number of mathematical operations running at once.
  • Cache (Moderate): Similar to CPUs, they need some short-term memory to get those calculations done.
  • Control (Low): Most of the calculations it does is the same calculation performed at a vast scale, but not as much variety as a CPU.
  • Memory (High): At least in the case of AI, a GPU needs to store massive model weights.

And if you remember, model weights have grown exponentially in recent years. When one of the first open LLMs came out in 2018, called BERT, it had about 110 million parameters. Now we have LLMs with over a trillion parameters, so that requires a lot of memory. And not just more memory, but also faster access to that memory, which is also referred to as the memory bandwidth.

This, ultimately, is why GPUs work for AI and LLMS. GPUs are great for performing a high volume of mathematical operations that are highly parallel as well, meaning it’s the similar operation performed at a large scale, all while holding huge model weights in memory or VRAM.

And there’s a really interesting reason why GPUs were originally built with so much memory. And that’s because they were created to render graphics faster, mostly for use in video games. So in graphics, large memory was used for holding data on the values of textures or lighting or shading or physics. And now, we repurpose that same large memory for holding huge model parameters. So without video games, we might not have LLMs as we know it.

Do You Always Need Expensive AI Hardware?

Now the next question is, do you always need GPUs or a whole AI data center to build AI systems? It really depends on what you’re doing and the size of the models that you’re using:

  • Training an LLM: Typically requires a GPU, no matter the size of the model, because training workloads are more intensive than simple inference workloads.
  • Tuning a Model: Large models typically require a GPU. For small models, they also typically require a GPU but there may be exceptions where you can use a CPU (e.g., if it’s an especially small model and you’re using a parameter-efficient tuning technique on a compressed model).
  • Running a Model (Inference):
  • Personal Use: If you don’t see a high volume of use and you’re using a single model with a single or very few inference calls, a CPU might certainly suffice.
  • Larger Models: If you’re using a personal application but a larger model (say, something above 10 billion parameters), you’ll likely want a GPU for the kind of speed that you’re hoping for.
  • Customer-Facing Apps: If it’s an app meant to support more users and larger tasks, a GPU is typically required if you’re using a larger model. Even if you are using a smaller model, a GPU is typically required because otherwise you’ll experience high latency.

The Bottom Line

It’s important to remember it’s AI hardware, not just the algorithms, that have enabled generative AI. However, building AI applications doesn’t automatically mean you need to use a whole data center full of GPUs or even any GPU at all in certain cases. So despite these chips being a critical technology for generative AI, you don’t need to let that deter you from starting small, with the hardware that you have access to today.

Leave a Reply