– [Narrator] Over the last year, everyone has been talking about. – Generative AI. – Generative AI. – Generative AI. – Generative AI. – I'm Like, wait, why am I doing this? I just wait for the AI to do it. – [Narrator] Driving
the boom are AI chips. Some no bigger than the size of your palm, and the demand for them has skyrocketed. – We originally thought the total market for data center, AI accelerators
would be about 150 billion, and now we think it's
gonna be over 400 billion. – [Narrator] As AI gains popularity, some of the world's tech titans are racing to design chips
that run better and faster. Here's how they work
and why tech companies are betting they're the future. This is "The Tech Behind AI Chips." This is Amazon's chip
lab in Austin, Texas, where the company designs AI chips to use in AWS's servers.
– Right out of manufacturing, we get something that is called the wafer. – [Narrator] Ron Diamant
is the chief architect of Inferentia and Trainium,
the company's custom AI chips. – These are the compute
elements or the components that actually perform the computation. – Each of these rectangles, called dice, is a chip. Each die contains tens of billions of microscopic semiconductors
called transistors that communicate inputs and outputs. – Think about one
millionth of a centimeter, that's roughly the size of
each one of these transistors.
– [Narrator] All chips use
semiconductors like this. What makes AI chips different from CPUs, the kind of chip that powers
your computer or phone, is how they're packaged. Say, for example, you want to generate a new image of a cat. CPUs have a smaller
number of powerful cores. The units that make up the chip that are good at doing a
lot of different things, these cores process
information sequentially. So one calculation after another. So to create a brand new image of a cat, it would only produce a
couple pixels at a time.
But an AI chip has more
cores that run in parallel, so it can process
hundreds or even thousands of those cat pixels all at once. These cores are smaller and typically do less than CPU cores but are specially designed
for running AI calculations. But those chips can't
operate on their own. – That compute die then gets
integrated into a package, and that's what people
typically think about when they think about the chip.
– Amazon makes two different AI chips, named for its two essential functions, training and inference. Training is where an AI model is set millions of examples of something, images of cats, for instance, to teach it what a cat is
and what it looks like. Inference is when it uses that training to actually generate an
original image of a cat. Training is the most difficult
part of this process. – We typically train not on one chip but rather on tens of thousands of chips. In contrast, inference is
typically done on 1 to 16 chips. – [Narrator] Processing
all of that information demands a lot of energy,
which generates heat. – And we're able to use this device here in order to force a certain
temperature to the chip, and that's how we're able to
test that the chip is reliable at very low temperatures
and very high temperatures.
– [Narrator] To help keep chips cool, they're attached to heat sinks, pieces of metal with vents
that help dissipate heat. Once they're packaged,
the chips are integrated into servers for Amazon's AWS cloud. – So the training cards will
be mounted on this baseboard, eight of them in total,
and they are interconnected between them at a very high
bandwidth and low latency. So this allows the
different training devices inside the server to work
together on the same training job. So if you are interacting
with an AI chatbot, your text, your question will hit the CPUs, and the CPUs will move the data into the Inferentia2 devices, which will collectively
perform a gigantic computation. Basically performing the AI model, will respond to the CPU with the result, and the CPU will send
the result back to you. – [Narrator] Amazon's chips
are just one type competing in this emerging market,
which is currently dominated by the biggest chip designer, Nvidia. – Nvidia is still a chip
provider to all different types of customers who have to
run different workloads.
And then the next category of competitor that you have is the
major cloud providers. Microsoft, Amazon, AWS, and Google are all designing their own chips because they can optimize
their computing workloads for the software that runs on their cloud to get a performance edge, and they don't have to give Nvidia its very juicy profit margin
on the sale of every chip. – [Narrator] But right now, generative AI is still a young technology. It's mostly used in
consumer-facing products like chatbots and image generators, but experts say that hype
cycles around technology can pay off in the end. – While there might be
something like a dot-com bubble for the current AI hype cycle, at the end of the dot-com
bubble was still the internet.
And I think we're in a similar
situation with generative AI. – [Narrator] The technology's
rapid advance means that chips and the software to use them
are going to have to keep up. Amazon says it uses a
mixture of its own chips and Nvidia's chips to give
customers multiple options. Microsoft says it's
following a similar model. – For those cloud
providers, the question is, how much of their
computing workloads for AI is gonna be offered through Nvidia versus their own custom AI chips? And that's the battle that's playing out in corporate boardrooms
all over the world. – [Narrator] Amazon released a new version of Trainium in November. Diamant says he doesn't see the AI boom slowing down anytime soon. – We've been investing in machine learning and artificial intelligence
for almost two decades now, and we're just seeing a
step-up in pace of innovation and capabilities that these
models are enabling us.
So our investment in AI
chips is here to stay with a significant step-up in capabilities from generation to generation..