Lesson 7 12 min read

The Hardware Story

Three guys in a Denny's booth built a company that's now worth more than any other on Earth. Their product: the chips that power every AI you've ever used.

Modern AI training clusters consume megawatts of power and millions of gallons of water per day.

What you'll learn

Why GPUs beat CPUs for AI -- and what CUDA actually is
NVIDIA's improbable path from gaming to the world's most valuable company
The physical scale of modern AI infrastructure: power, water, land
Why a single chip factory in Taiwan is a geopolitical flashpoint
How to run AI on your own laptop, no cloud required

GPU vs CPU vs TPU -- Why GPUs Won

A CPU is a brilliant professor. It has a handful of powerful cores (typically 4-64), each capable of tackling complex tasks one at a time. It's fast, versatile, and excellent at sequential work -- but it solves one problem before moving to the next.

A GPU is an army of students. It has thousands of smaller cores, each designed for simple math, all running simultaneously. Neural network training is overwhelmingly parallel math -- millions of matrix multiplications happening at once. On this kind of work, GPUs run up to 100x faster than CPUs.

Google built a third option: the TPU (Tensor Processing Unit), a custom chip using a systolic array architecture where data flows through the chip like blood through a heart. TPUs are purpose-built for TensorFlow and tensor operations, but they're only available in Google Cloud -- you can't buy one.

So why did GPUs win, not TPUs or some other custom chip? Three reasons. First, NVIDIA released CUDA in 2006 -- a software platform that let anyone program GPUs for general-purpose computing, not just graphics. Second, anyone could buy a GPU; TPUs required a Google Cloud account. Third, the entire ML community -- PyTorch, TensorFlow, every research lab -- optimized for CUDA first. Network effects locked it in.

100x

GPU speed advantage over CPU for training

16,000+

Cores in a modern NVIDIA GPU

Vendor dominates AI compute: NVIDIA

Jeff Dean calculated that if hundreds of millions of people talked to Google for just three minutes a day, it would require "basically all the compute power that Google had deployed." That insight drove the TPU project. -- Google Cloud

NVIDIA: From a Denny's Booth to $5 Trillion

In 1993, Jensen Huang, Chris Malachowsky, and Curtis Priem founded NVIDIA in a Denny's booth in San Jose. Their target market: 3D graphics for gaming. Nobody was thinking about artificial intelligence.

1993

Founded in a Denny's. Target: gaming GPUs.

1999

GeForce 256 launches -- "the world's first GPU."

2006
CUDA launches. Wall Street panics. Valuation drops from $12B to $2-3B. Investors see no demand for general-purpose GPU computing.

2009

Researchers demonstrate 70x speedup training neural networks on GPUs vs CPUs.

2012
AlexNet wins ImageNet. Two GTX 580 GPUs in a bedroom. The "Big Bang" of modern AI.

2016

Jensen personally delivers the first DGX-1 to OpenAI. "This is the world's first AI supercomputer."

May 2023

NVIDIA hits $1 trillion market cap.

Feb 2024

$2 trillion. Nine months after $1T.

Jun 2024

$3 trillion. Four months after $2T.

2025-26
$5 trillion. World's most valuable company. Jensen's leather jacket replaces the turtleneck.

History Thread

The AlexNet Story: In 2012, Alex Krizhevsky trained a convolutional neural network on two NVIDIA GTX 580s in his parents' bedroom. His collaborators: Ilya Sutskever (who would co-found OpenAI) and Geoffrey Hinton (who would win the Nobel Prize). AlexNet's error rate was 10.8 percentage points better than the runner-up. Hinton later quipped: "Ilya thought we should do it, Alex made it work, and I got the Nobel Prize."

CUDA: The 20-Year Bet

When NVIDIA launched CUDA in 2006, it was a software platform that let developers program GPUs for general-purpose computing. The market rejected it. NVIDIA's valuation dropped by 75%. For six years, CUDA served a tiny niche of scientific computing researchers.

Then AlexNet proved that CUDA + GPUs could train neural networks orders of magnitude faster than anything else. From that point, CUDA became the lingua franca of AI research. Today, every major model -- GPT, Claude, Gemini, Llama -- trains on NVIDIA GPUs running CUDA.

"CUDA was a 20-year bet. The masses had shown no indication that they wanted such a thing."-- The Chip Letter

Fireship's "CUDA in 100 Seconds" -- a fast, entertaining explainer on why CUDA matters.

The Scale of Training

In 2012, AlexNet trained on two GPUs that cost about $500 each. Twelve years later, training a frontier model costs more than building a skyscraper.

Model	Year	GPUs	Est. Cost
AlexNet	2012	2 GTX 580s	~$1,000
GPT-3	2020	~10,000 V100s	~$4.6M
GPT-4	2023	~25,000 A100s	$63-100M+
Llama 3.1 405B	2024	16,384 H100s	~$170M
Gemini Ultra	2024	Google TPUs	~$191M

Dario Amodei, Anthropic's CEO, has said there are "models in training today that cost more like a billion dollars." He predicts training costs will hit $10 billion by 2025-2026 and $100 billion by 2027. Sam Altman envisions AI eventually becoming "too cheap to meter" -- like electricity. We're not there yet.

$100M+

Cost to train GPT-4

90 days

Training time on 25,000 GPUs

50 GWh

Energy for one large training run (SF for 3 days)

A 10,000-GPU training cluster consumes 10-15 megawatts -- enough to power a small town. These clusters produce power fluctuations of hundreds of megawatts within seconds, straining local grids.

Interactive

AlexNet's Training Rig (2012)

GPUs: 2 Cost: $1,000 Power: ~600W Size: 1 desktop

GPT-3 Cluster (2020)

Showing 200 of 10,000 GPUs

GPUs: 10,000 Cost: ~$100M Power: 10-15 MW Size: Small town's power

GPT-4 Cluster (2023)

Showing 350 of 25,000 GPUs

GPUs: 25,000 Cost: $375M+ Power: ~30 MW Size: Football field of racks

The Trillion-Dollar Cluster (2030?)

Showing 500 of millions of GPUs

GPUs: Millions Cost: $1 Trillion Power: 100 GW Size: 20% of US electricity

Data Centers: The Physical Reality

AI isn't just software. It's infrastructure -- warehouses the size of small towns, consuming electricity and water at industrial scale.

Aerial view of a massive data center campus

Facility	Owner	Size
Stargate, Abilene TX	OpenAI	875 acres (will reach Central Park size)
Fairwater, Wisconsin	Microsoft	315 acres, "enough fiber to circle Earth 4.5 times"
Altoona, Iowa	Meta	5+ million sq ft

The largest AI data center campuses will soon be a fifth the size of Manhattan. US data centers are projected to grow from ~30 GW in 2025 to 90+ GW by 2030. AI data centers in Arizona already use 7.4% of the state's power; Oregon, 11.4%.

875 acres

OpenAI's Stargate campus

5M gal/day

Water usage at large data centers

90+ GW

Projected US data center power by 2030

Water is the hidden resource. A typical data center uses 300,000 gallons of water per day for cooling -- equivalent to about 1,000 households. Large AI facilities use up to 5 million gallons daily. Over 160 new AI data centers have been built in areas with high water stress. Every ChatGPT conversation drinks a few drops of water. Google disclosed its Gemini queries use about 0.26ml per query -- five drops.

By 2030, the trillion-dollar cluster would use 100 gigawatts -- over 20% of US electricity production. -- Leopold Aschenbrenner, "Situational Awareness" (2024)

The Chip Wars

Every AI model you've ever used was trained on chips manufactured by a single company you've probably never heard of: TSMC (Taiwan Semiconductor Manufacturing Company). They produce more than 90% of the world's most advanced chips.

Aerial view of TSMC semiconductor fabrication plant

Founded in 1987 by Morris Chang, TSMC pioneered the "foundry model" -- manufacturing chips designed by others. NVIDIA designs GPUs; TSMC actually makes them. So do Apple, AMD, and virtually every other major chip designer. This makes TSMC a chokepoint for the entire global economy, not just AI.

"Military, economic, and geopolitical power are built on a foundation of computer chips." -- Chris Miller, Chip War (2022 Financial Times Business Book of the Year)

The US has responded with sweeping export controls on advanced chip sales to China (October 2022, tightened in 2023, 2024, and 2025), plus $165 billion in US fab investment to reduce dependency on Taiwan. China has retaliated with 100% tariffs on semiconductor imports and rare earth export controls (China controls ~70% of rare earth processing).

The uncomfortable reality: one earthquake in Taiwan could disrupt the global AI supply chain. This is why the semiconductor supply chain is now bifurcated along geopolitical lines -- your access to advanced chips depends on which side of the fence your country sits on.

Pop Culture Connection

Chip War by Chris Miller is the essential book on this topic -- it won the Financial Times Business Book of the Year and the IEEE History Prize. Leopold Aschenbrenner's 165-page essay "Situational Awareness" (2024) went viral predicting AGI by 2027 and trillion-dollar compute clusters. It launched a $1.5 billion AI fund. Project Stargate -- Microsoft and OpenAI's $100 billion data center project -- was announced with the scale of a national infrastructure program, named like the sci-fi franchise for a reason.

The GPU Arms Race

The hardware itself has evolved at a staggering pace. Here's how NVIDIA's AI-relevant GPUs have progressed:

GPU	Year	Memory	AI Performance	Price
GTX 580	2010	1.5 GB	N/A (gaming)	~$500
Tesla V100	2017	32 GB	125 TFLOPS	~$10,000
A100	2020	80 GB	312 TFLOPS	~$15,000
H100	2022	80 GB	990 TFLOPS	$27-40K
B200 (Blackwell)	2025	192 GB	20 PFLOPS	~$40K+

The B200 packs 208 billion transistors into a revolutionary dual-chip design -- 5x the AI performance of H100, 2.4x the memory. During the 2023-2024 H100 shortage, companies hoarded GPUs like strategic assets. Cloud rental rates hit $7-10 per GPU-hour. By late 2025, supply caught up and prices collapsed to $2-4/hour.

Can You Run AI on Your Laptop?

Yes. And this is one of the most underappreciated developments in AI.

llama.cpp is a lightweight C++ inference engine that runs large language models entirely offline on consumer hardware. The key enabling trick is quantization: reducing the precision of model weights from 16-bit floating point to 4-bit integers. This shrinks a model by about 75% with negligible quality loss.

Your RAM	Max Model Size	Examples	Speed
8 GB	Up to 3B params	Llama 3.2 3B, Phi-4 Mini	10-25 tok/sec
16 GB	Up to 8B (full)	Llama 3.1 8B, Mistral 7B	15-40 tok/sec
32 GB	Up to 70B (quantized)	Llama 2 70B (Q4)	5-15 tok/sec
64 GB+	70B+ full quality	Llama 3.1 70B	Varies

Apple Silicon has a special advantage here: unified memory. On M-series Macs, CPU and GPU share the same RAM pool -- no bottleneck moving data between them. A MacBook Pro with 96GB unified memory can comfortably run 70B-parameter models. The GGUF format bundles weights, tokenizer, and metadata into a single file. Download one .gguf file, point llama.cpp at it, and you're running AI locally. No cloud. No API key. No internet.

Your $1,500 laptop can now run a 7-billion-parameter model that would have been science fiction five years ago. But GPT-4 scale still requires 25,000 GPUs and a small city's worth of power. The gap between "runs on a laptop" and "frontier model" is enormous -- and growing.

"AI is the most powerful technology force of our time." -- Jensen Huang

"I think it's now. I think we've achieved AGI." -- Jensen Huang, Lex Fridman Podcast #494 (March 2026)

History Thread

Jensen's Leather Jacket has become an iconic symbol of the AI era -- to AI what Steve Jobs' black turtleneck was to smartphones. From a Denny's booth in 1993 to the world's most valuable company in 2025: the trajectory is almost absurd. But the 2006 CUDA bet -- the one Wall Street punished with a 75% valuation drop -- is the real pivot. Jensen bet that parallel computing would matter for something beyond gaming. It took six years for the bet to pay off (AlexNet, 2012) and seventeen years to become the most valuable company on Earth.

Key Terms

GPU: Graphics Processing Unit -- thousands of parallel cores ideal for AI training
CUDA: NVIDIA's software platform enabling GPUs for general-purpose computing (2006)
TPU: Google's custom AI chip (Tensor Processing Unit), available only in Google Cloud
Quantization: Reducing weight precision (e.g., 16-bit to 4-bit) to shrink model size and speed up inference
GGUF: Single-file model format for local inference in the llama.cpp ecosystem
FLOPS: Floating Point Operations Per Second -- the standard measure of compute power
TSMC: Taiwan Semiconductor Manufacturing Company -- fabricates 90%+ of the world's most advanced chips

Did This Land?

Why did NVIDIA's stock drop 75% when they launched CUDA?

In 2006, there was no visible demand for general-purpose GPU computing. Investors saw NVIDIA spending resources on a platform nobody asked for. It took six years -- until AlexNet proved GPUs could train neural networks orders of magnitude faster than CPUs -- for the bet to pay off.

What's the difference between running a model locally and using an API?

Local: runs on your hardware, works offline, completely private, limited by your RAM (8-70B params typically). API: runs on someone else's data center, requires internet, costs per token, but can run much larger models (GPT-4 scale and beyond). It's the difference between owning a bicycle and hailing a taxi -- both get you there, but the tradeoffs are different.

Why is TSMC a geopolitical concern?

TSMC manufactures 90%+ of the world's most advanced chips and is based in Taiwan. An earthquake, conflict, or political crisis involving Taiwan could disrupt the global AI supply chain -- along with smartphones, cars, medical devices, and virtually everything else with a modern chip. This is why the US is spending $165 billion to build domestic fabrication plants.

Lesson Summary

GPUs won because of CUDA (2006), availability (anyone could buy one), and community (PyTorch/TensorFlow optimized for NVIDIA first). Training on GPUs is up to 100x faster than CPUs.
NVIDIA's trajectory -- from a Denny's booth ($0) to the world's most valuable company ($5T) -- was driven by a 20-year bet on parallel computing that Wall Street punished before history rewarded.
Training costs have exploded from $1,000 (AlexNet, 2012) to $100M+ (GPT-4) to projected $100B+ by 2027. A single training run consumes enough energy to power San Francisco for three days.
Data centers are the new megastructures: 875 acres, millions of gallons of water per day, 90+ GW of power projected by 2030. AI is infrastructure, not just software.
TSMC is a single point of failure for the entire chip supply chain. The semiconductor industry is now bifurcated along geopolitical lines.
You can run AI on your laptop thanks to quantization and llama.cpp -- but frontier models still require warehouse-scale compute.