The Hardware Story
Three guys in a Denny's booth built a company that's now worth more than any other on Earth. Their product: the chips that power every AI you've ever used.
What you'll learn
- Why GPUs beat CPUs for AI -- and what CUDA actually is
- NVIDIA's improbable path from gaming to the world's most valuable company
- The physical scale of modern AI infrastructure: power, water, land
- Why a single chip factory in Taiwan is a geopolitical flashpoint
- How to run AI on your own laptop, no cloud required
GPU vs CPU vs TPU -- Why GPUs Won
A CPU is a brilliant professor. It has a handful of powerful cores (typically 4-64), each capable of tackling complex tasks one at a time. It's fast, versatile, and excellent at sequential work -- but it solves one problem before moving to the next.
A GPU is an army of students. It has thousands of smaller cores, each designed for simple math, all running simultaneously. Neural network training is overwhelmingly parallel math -- millions of matrix multiplications happening at once. On this kind of work, GPUs run up to 100x faster than CPUs.
Google built a third option: the TPU (Tensor Processing Unit), a custom chip using a systolic array architecture where data flows through the chip like blood through a heart. TPUs are purpose-built for TensorFlow and tensor operations, but they're only available in Google Cloud -- you can't buy one.
So why did GPUs win, not TPUs or some other custom chip? Three reasons. First, NVIDIA released CUDA in 2006 -- a software platform that let anyone program GPUs for general-purpose computing, not just graphics. Second, anyone could buy a GPU; TPUs required a Google Cloud account. Third, the entire ML community -- PyTorch, TensorFlow, every research lab -- optimized for CUDA first. Network effects locked it in.
Jeff Dean calculated that if hundreds of millions of people talked to Google for just three minutes a day, it would require "basically all the compute power that Google had deployed." That insight drove the TPU project. -- Google Cloud
NVIDIA: From a Denny's Booth to $5 Trillion
In 1993, Jensen Huang, Chris Malachowsky, and Curtis Priem founded NVIDIA in a Denny's booth in San Jose. Their target market: 3D graphics for gaming. Nobody was thinking about artificial intelligence.
The AlexNet Story: In 2012, Alex Krizhevsky trained a convolutional neural network on two NVIDIA GTX 580s in his parents' bedroom. His collaborators: Ilya Sutskever (who would co-found OpenAI) and Geoffrey Hinton (who would win the Nobel Prize). AlexNet's error rate was 10.8 percentage points better than the runner-up. Hinton later quipped: "Ilya thought we should do it, Alex made it work, and I got the Nobel Prize."
CUDA: The 20-Year Bet
When NVIDIA launched CUDA in 2006, it was a software platform that let developers program GPUs for general-purpose computing. The market rejected it. NVIDIA's valuation dropped by 75%. For six years, CUDA served a tiny niche of scientific computing researchers.
Then AlexNet proved that CUDA + GPUs could train neural networks orders of magnitude faster than anything else. From that point, CUDA became the lingua franca of AI research. Today, every major model -- GPT, Claude, Gemini, Llama -- trains on NVIDIA GPUs running CUDA.
"CUDA was a 20-year bet. The masses had shown no indication that they wanted such a thing."-- The Chip Letter
The Scale of Training
In 2012, AlexNet trained on two GPUs that cost about $500 each. Twelve years later, training a frontier model costs more than building a skyscraper.
| Model | Year | GPUs | Est. Cost |
|---|---|---|---|
| AlexNet | 2012 | 2 GTX 580s | ~$1,000 |
| GPT-3 | 2020 | ~10,000 V100s | ~$4.6M |
| GPT-4 | 2023 | ~25,000 A100s | $63-100M+ |
| Llama 3.1 405B | 2024 | 16,384 H100s | ~$170M |
| Gemini Ultra | 2024 | Google TPUs | ~$191M |
Dario Amodei, Anthropic's CEO, has said there are "models in training today that cost more like a billion dollars." He predicts training costs will hit $10 billion by 2025-2026 and $100 billion by 2027. Sam Altman envisions AI eventually becoming "too cheap to meter" -- like electricity. We're not there yet.
A 10,000-GPU training cluster consumes 10-15 megawatts -- enough to power a small town. These clusters produce power fluctuations of hundreds of megawatts within seconds, straining local grids.
Data Centers: The Physical Reality
AI isn't just software. It's infrastructure -- warehouses the size of small towns, consuming electricity and water at industrial scale.
| Facility | Owner | Size |
|---|---|---|
| Stargate, Abilene TX | OpenAI | 875 acres (will reach Central Park size) |
| Fairwater, Wisconsin | Microsoft | 315 acres, "enough fiber to circle Earth 4.5 times" |
| Altoona, Iowa | Meta | 5+ million sq ft |
The largest AI data center campuses will soon be a fifth the size of Manhattan. US data centers are projected to grow from ~30 GW in 2025 to 90+ GW by 2030. AI data centers in Arizona already use 7.4% of the state's power; Oregon, 11.4%.
Water is the hidden resource. A typical data center uses 300,000 gallons of water per day for cooling -- equivalent to about 1,000 households. Large AI facilities use up to 5 million gallons daily. Over 160 new AI data centers have been built in areas with high water stress. Every ChatGPT conversation drinks a few drops of water. Google disclosed its Gemini queries use about 0.26ml per query -- five drops.
By 2030, the trillion-dollar cluster would use 100 gigawatts -- over 20% of US electricity production. -- Leopold Aschenbrenner, "Situational Awareness" (2024)
The Chip Wars
Every AI model you've ever used was trained on chips manufactured by a single company you've probably never heard of: TSMC (Taiwan Semiconductor Manufacturing Company). They produce more than 90% of the world's most advanced chips.
Founded in 1987 by Morris Chang, TSMC pioneered the "foundry model" -- manufacturing chips designed by others. NVIDIA designs GPUs; TSMC actually makes them. So do Apple, AMD, and virtually every other major chip designer. This makes TSMC a chokepoint for the entire global economy, not just AI.
"Military, economic, and geopolitical power are built on a foundation of computer chips." -- Chris Miller, Chip War (2022 Financial Times Business Book of the Year)
The US has responded with sweeping export controls on advanced chip sales to China (October 2022, tightened in 2023, 2024, and 2025), plus $165 billion in US fab investment to reduce dependency on Taiwan. China has retaliated with 100% tariffs on semiconductor imports and rare earth export controls (China controls ~70% of rare earth processing).
The uncomfortable reality: one earthquake in Taiwan could disrupt the global AI supply chain. This is why the semiconductor supply chain is now bifurcated along geopolitical lines -- your access to advanced chips depends on which side of the fence your country sits on.
Chip War by Chris Miller is the essential book on this topic -- it won the Financial Times Business Book of the Year and the IEEE History Prize. Leopold Aschenbrenner's 165-page essay "Situational Awareness" (2024) went viral predicting AGI by 2027 and trillion-dollar compute clusters. It launched a $1.5 billion AI fund. Project Stargate -- Microsoft and OpenAI's $100 billion data center project -- was announced with the scale of a national infrastructure program, named like the sci-fi franchise for a reason.
The GPU Arms Race
The hardware itself has evolved at a staggering pace. Here's how NVIDIA's AI-relevant GPUs have progressed:
| GPU | Year | Memory | AI Performance | Price |
|---|---|---|---|---|
| GTX 580 | 2010 | 1.5 GB | N/A (gaming) | ~$500 |
| Tesla V100 | 2017 | 32 GB | 125 TFLOPS | ~$10,000 |
| A100 | 2020 | 80 GB | 312 TFLOPS | ~$15,000 |
| H100 | 2022 | 80 GB | 990 TFLOPS | $27-40K |
| B200 (Blackwell) | 2025 | 192 GB | 20 PFLOPS | ~$40K+ |
The B200 packs 208 billion transistors into a revolutionary dual-chip design -- 5x the AI performance of H100, 2.4x the memory. During the 2023-2024 H100 shortage, companies hoarded GPUs like strategic assets. Cloud rental rates hit $7-10 per GPU-hour. By late 2025, supply caught up and prices collapsed to $2-4/hour.
Can You Run AI on Your Laptop?
Yes. And this is one of the most underappreciated developments in AI.
llama.cpp is a lightweight C++ inference engine that runs large language models entirely offline on consumer hardware. The key enabling trick is quantization: reducing the precision of model weights from 16-bit floating point to 4-bit integers. This shrinks a model by about 75% with negligible quality loss.
| Your RAM | Max Model Size | Examples | Speed |
|---|---|---|---|
| 8 GB | Up to 3B params | Llama 3.2 3B, Phi-4 Mini | 10-25 tok/sec |
| 16 GB | Up to 8B (full) | Llama 3.1 8B, Mistral 7B | 15-40 tok/sec |
| 32 GB | Up to 70B (quantized) | Llama 2 70B (Q4) | 5-15 tok/sec |
| 64 GB+ | 70B+ full quality | Llama 3.1 70B | Varies |
Apple Silicon has a special advantage here: unified memory. On M-series Macs, CPU and GPU share the same RAM pool -- no bottleneck moving data between them. A MacBook Pro with 96GB unified memory can comfortably run 70B-parameter models. The GGUF format bundles weights, tokenizer, and metadata into a single file. Download one .gguf file, point llama.cpp at it, and you're running AI locally. No cloud. No API key. No internet.
Your $1,500 laptop can now run a 7-billion-parameter model that would have been science fiction five years ago. But GPT-4 scale still requires 25,000 GPUs and a small city's worth of power. The gap between "runs on a laptop" and "frontier model" is enormous -- and growing.
"AI is the most powerful technology force of our time." -- Jensen Huang
"I think it's now. I think we've achieved AGI." -- Jensen Huang, Lex Fridman Podcast #494 (March 2026)
Jensen's Leather Jacket has become an iconic symbol of the AI era -- to AI what Steve Jobs' black turtleneck was to smartphones. From a Denny's booth in 1993 to the world's most valuable company in 2025: the trajectory is almost absurd. But the 2006 CUDA bet -- the one Wall Street punished with a 75% valuation drop -- is the real pivot. Jensen bet that parallel computing would matter for something beyond gaming. It took six years for the bet to pay off (AlexNet, 2012) and seventeen years to become the most valuable company on Earth.
Key Terms
- GPU
- Graphics Processing Unit -- thousands of parallel cores ideal for AI training
- CUDA
- NVIDIA's software platform enabling GPUs for general-purpose computing (2006)
- TPU
- Google's custom AI chip (Tensor Processing Unit), available only in Google Cloud
- Quantization
- Reducing weight precision (e.g., 16-bit to 4-bit) to shrink model size and speed up inference
- GGUF
- Single-file model format for local inference in the llama.cpp ecosystem
- FLOPS
- Floating Point Operations Per Second -- the standard measure of compute power
- TSMC
- Taiwan Semiconductor Manufacturing Company -- fabricates 90%+ of the world's most advanced chips
Did This Land?
Why did NVIDIA's stock drop 75% when they launched CUDA?
What's the difference between running a model locally and using an API?
Why is TSMC a geopolitical concern?
Lesson Summary
- GPUs won because of CUDA (2006), availability (anyone could buy one), and community (PyTorch/TensorFlow optimized for NVIDIA first). Training on GPUs is up to 100x faster than CPUs.
- NVIDIA's trajectory -- from a Denny's booth ($0) to the world's most valuable company ($5T) -- was driven by a 20-year bet on parallel computing that Wall Street punished before history rewarded.
- Training costs have exploded from $1,000 (AlexNet, 2012) to $100M+ (GPT-4) to projected $100B+ by 2027. A single training run consumes enough energy to power San Francisco for three days.
- Data centers are the new megastructures: 875 acres, millions of gallons of water per day, 90+ GW of power projected by 2030. AI is infrastructure, not just software.
- TSMC is a single point of failure for the entire chip supply chain. The semiconductor industry is now bifurcated along geopolitical lines.
- You can run AI on your laptop thanks to quantization and llama.cpp -- but frontier models still require warehouse-scale compute.