Lesson 1 10 min read

What Is a "Model"?

You can download an AI and put it on a USB stick. But should you?

What you'll learn

  • What an AI model actually is, physically -- a file of numbers on a hard drive
  • What parameters, weights, and biases represent
  • The rough scale of modern models (billions to trillions of parameters)
  • Common model file formats and their purposes

It's Just a File of Numbers

Strip away the mystique and an AI model is exactly what it sounds like: a very large file sitting on a hard drive, filled with decimal numbers. Billions of them. Each number is a weightA number controlling the strength of a connection between neurons in the network or a biasAn offset value allowing the activation function to shift, ensuring the model can fit data even when all inputs are zero -- a numerical value that was adjusted during training to make the model's predictions slightly less wrong.

Nobody wrote these numbers by hand. Nobody understands what most of them do individually. They emerged -- through billions of iterations of an optimization process -- from raw data. This is Andrej Karpathy's "Software 2.0" thesis in a nutshell:

"Software 2.0 is written in much more abstract, human-unfriendly language, such as the weights of a neural network. No human is involved in writing this code because there are a lot of weights (typical networks might have millions), and coding directly in weights is kind of hard."

Andrej Karpathy, "Software 2.0" (2017)

Traditional software is if/else statements and loops -- code a human wrote, line by line. A neural network is millions of decimal numbers that collectively encode a function no human designed. The logic is there, but it's distributed across every single parameter, entangled in ways that resist easy interpretation.

Visualization of a neural network with glowing connections between nodes

How Big?

The math is straightforward. Each parameterA single number in a model (weight or bias) that was adjusted during training is typically stored as a floating-point number. At full precision (FP16), that's 2 bytes per parameter. At 4-bit quantization, it's half a byte.

7B
Llama 2 parameters
175B
GPT-3 parameters
~1.8T
GPT-4 parameters (rumored)
4 GB
Llama 7B quantized
Model Parameters File Size (FP16) File Size (4-bit)
Llama 2 7B7 billion~14 GB~4 GB
Llama 2 70B70 billion~140 GB~35 GB
Llama 3.1 405B405 billion~750 GB~200 GB
GPT-3175 billion~350 GB~87 GB
GPT-4 (rumored)~1.8 trillion~3.6 TB~900 GB

GPT-4 is rumored to be a Mixture of Experts (MoE) model: roughly 1.8 trillion total parameters spread across 16 expert sub-models, each with ~111 billion parameters. Only 2 of those 16 experts activate for any given query. It's eight smaller models wearing a trenchcoat -- not all 1.8 trillion parameters fire every time you ask it a question.

The model file for GPT-3 would fill your laptop's hard drive. The training data -- roughly 13 trillion tokens -- would fill a warehouse.

What the Numbers Actually Represent

Before training starts, every weight is random noise. The model knows nothing. Then, through billions of small adjustments -- each one nudging a weight up or down to reduce prediction error -- patterns emerge. After training, those weights encode visual features, linguistic structures, logical relationships, and much more.

But here's the thing that trips people up: the knowledge isn't stored as discrete facts you can look up. You can't open the file and find "Paris is the capital of France" written anywhere. The information is compressed and distributed across all the parameters at once -- entangled in the connections between neurons, not sitting in any single cell.

Brain comparison: The human brain has roughly 86 billion neurons and ~100 trillion synapses. GPT-4's ~1.8 trillion parameters represent about 1-2% of the brain's synaptic connections in raw count. But synapses are far more complex than parameters -- they regulate connections through dynamic bio-electrochemical processes, not static decimal numbers.

Inside a Model File

What does a model actually look like on disk? Let's zoom in. Here's a simplified view of a GGUF file -- the format used for running models locally with llama.cpp.

CSS Animation
llama-7b.gguf 4.0 GB · 7,000,000,000 parameters
Layer 0 · attention.wq · [4096 x 4096]
Layer 0 · attention.wk · [4096 x 4096]
Layer 0 · attention.wv · [4096 x 4096]
Layer 0 · feed_forward.w1 · [4096 x 11008]
... 31 more layers · tokenizer · metadata
Each cell = one weight. Blue = positive, red = negative. Magnitude = intensity.

Model Formats

Not all model files are created equal. The format matters for security, speed, and where you can run the model.

PyTorch .pt / .pth

The original research format. Uses Python's pickle serialization, which means the file can execute arbitrary code when loaded -- a real security risk if you download models from untrusted sources. Still common in research labs.

SafeTensors .safetensors

Developed by Hugging Face specifically to fix pickle's security flaws. Restricted deserialization prevents code execution attacks. JSON metadata header plus raw tensor data. Fast, memory-mappable, and the recommended format for sharing models publicly.

GGUF .gguf

"Generic GPT Unified Format" -- built for the llama.cpp ecosystem. A single file containing weights, tokenizer, and metadata. Supports quantizationReducing the precision of weights (e.g., FP16 to INT4) to shrink file size and speed up inference natively (Q4, Q5, Q8 variants). The go-to format for running models on your own hardware.

Program, Database, or Brain Scan?

People keep reaching for familiar categories and none of them quite fit.

Like a program? It takes input and produces output, executing a fixed computational graph. But nobody wrote the logic -- the logic emerged from data.

Like a database? It stores "knowledge." But the knowledge is compressed and distributed across billions of numbers. You can't query individual facts -- there's no SELECT capital FROM countries WHERE name='France'.

Like a brain scan? It captures learned patterns from experience. But unlike a brain, the file on disk is inert. There's no ongoing process, no metabolism, no dreams.

It's none of these -- and arguably all three, in limited ways. That discomfort is the point. We built a new kind of artifact and the old categories don't apply cleanly.

The debate about what models are isn't just academic. Emily Bender et al. coined the term "stochastic parrot" in 2021, arguing LLMs are "stitching together sequences of linguistic forms... without any reference to meaning." Sam Altman responded on Twitter: "i am a stochastic parrot, and so r u." Meanwhile, Ilya Sutskever tweeted in 2022: "it may be that today's large neural networks are slightly conscious." The truth is probably more interesting than either pole.

Go Deeper

Grant Sanderson's 3Blue1Brown series is the gold standard for visual explanations of neural networks. This first episode covers exactly what's inside a model -- weights, biases, and how they form layers.

3Blue1Brown -- "But what is a Neural Network?" (Chapter 1, Deep Learning series)

Want to get hands-on? TensorFlow Playground lets you build and train a tiny neural network in your browser -- watching the weights update in real time.

History Thread

1842 -- Ada Lovelace noted that Babbage's Analytical Engine "has no pretensions to originate anything" -- it can only do what it's instructed to do. For over a century, this was the assumption about all computing machines.

1950 -- Alan Turing opened his landmark paper "Computing Machinery and Intelligence" with: "I propose to consider the question, 'Can machines think?'" But he immediately recognized the words "think" and "machine" can't be clearly defined, so he replaced the question with a behavioral test -- the Imitation Game.

1956 -- The Dartmouth Conference coined "Artificial Intelligence" as a field.

2017 -- Karpathy's "Software 2.0": "When we develop AGI, it will certainly be written in Software 2.0."

Alan Turing statue at Bletchley Park

Alan Turing statue at Bletchley Park. His 1950 paper reframed "Can machines think?" as an empirical question that still drives AI research today.

Pop Culture Connection

HAL 9000 (2001: A Space Odyssey, 1968) -- A voice-based AI that reasons, plans, and lies. Kubrick and Clarke predicted Siri and Alexa decades before they existed. HAL's unsettling calm as it refuses to open the pod bay doors is still the cultural touchstone for AI gone wrong.

Her (2013) -- Spike Jonze's film about a man who falls in love with an AI voice assistant. Arguably the most accurate portrayal of what interacting with a modern LLM feels like: warm, responsive, occasionally uncanny.

The Imitation Game (2014) -- Benedict Cumberbatch as Turing, dramatizing the origin of computation itself. The film's title comes directly from Turing's 1950 paper.

Key Terms

Model
A mathematical function defined by billions of learned numerical values stored in a file on disk
Parameter
A single number in a model (weight or bias) that was adjusted during training
Weight
A number controlling the strength of a connection between neurons in the network
Bias
An offset value allowing the activation function to shift, ensuring the model can fit data when all inputs are zero
Inference
Running a trained model to generate output from input -- the "using it" phase, as opposed to training
Quantization
Reducing the precision of weights (e.g., FP16 to INT4) to shrink file size and speed up inference at the cost of some accuracy

Did This Land?

If someone says GPT-4 has 1.8 trillion parameters, what are those parameters?

Decimal numbers (weights and biases) that were adjusted during training to encode learned patterns. Each one controls some aspect of how the network transforms input into output. Together they represent compressed knowledge extracted from trillions of tokens of training data.

Could you fit GPT-4 on a USB stick? Why or why not?

No -- the full model is roughly 3.6 TB at half precision, far beyond any USB stick. But smaller models absolutely fit: Llama 7B quantized is about 4 GB, which fits on a cheap thumb drive. An engineer actually built a working LLM on a Raspberry Pi Zero inside a USB stick case -- it just runs very slowly (~2 seconds per token).

What's the difference between the model file and the training data?

The model file contains the compressed, learned patterns -- billions of numerical weights. The training data is the raw source material (text, images, code) that the model was trained on. GPT-3's model file is ~350 GB. Its training data was hundreds of terabytes. The model is a lossy compression of that data.

Lesson Summary

  • An AI model is a file of billions of decimal numbers (weights and biases) stored on disk. Nobody wrote those numbers by hand -- they emerged from training data through optimization.
  • Model sizes range from 4 GB (Llama 7B quantized, fits on a USB stick) to ~3.6 TB (GPT-4 full precision, needs multiple drives). GPT-4 is likely 16 expert sub-models, only 2 active per query.
  • The knowledge inside a model is compressed and distributed -- you can't look up individual facts like a database. It's a new kind of artifact that doesn't map cleanly to program, database, or brain.
  • Three file formats dominate: PyTorch (research, security risk), SafeTensors (safe sharing), and GGUF (local inference).
  • Turing's 1950 question -- "Can machines think?" -- is still the question. We just have very different answers now.