The Threads of Thought: From Hebb’s Fire to the AI’s Web of Words

The quest to understand intelligence, whether born of carbon or silicon, often leads us back to a fundamental principle of association. In the wetware of the brain, this was elegantly captured by a Canadian psychologist over half a century ago. In the digital minds of our newest creations, it emerges from a relentless analysis of statistics across unimaginably vast troves of text. This is the story of two different kinds of neurons, and how the simple act of “togetherness” forges the foundations of memory, prediction, and a nascent form of intelligence.

Part 1: The Biological Blueprint – Hebb’s Rule

In 1949, Donald Hebb proposed a revolutionary idea to explain how the brain learns and adapts. In his book The Organization of Behavior, he postulated:

“When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”

In simpler, more memorable terms: ”Neurons that fire together, wire together.” (this was already discussed in here.)

This is not a mere slogan, it is a mechanistic description of synaptic plasticity. The synapse is the junction between two neurons. Hebb’s rule states that if Neuron A consistently and repeatedly triggers Neuron B to generate an action potential (the “spike” or “firing” of a neuron), the connection between them strengthens. The synapse becomes more efficient, meaning it requires less stimulation for Neuron A to cause Neuron B to fire in the future.

How It Works: The Cellular Foundation of Learning

Imagine two neurons connected by a synapse. In a naive state, the connection is weak.

1. Simultaneous Activation: A specific event occurs, say, a child touches a hot stove. Sensory neurons (Neuron A) carrying the signal “extremely high temperature” fire. At almost the exact same time, other neurons (Neuron B) carrying the signal “sharp pain in hand” fire.

2. Strengthening the Bond: Because Neuron A and Neuron B are active simultaneously, the synaptic connection between them is chemically reinforced. This is achieved through biological processes like long-term potentiation (LTP), where the pre-synaptic neuron releases more neurotransmitter and the post-synaptic neuron becomes more sensitive to it.

3. Formation of a Circuit: This strengthened connection now forms a primitive circuit. In the future, the mere activation of the “hot stove” visual or tactile signal (Neuron A) will be so efficient at activating the “pain” neurons (Neuron B) that the brain anticipates the pain before it fully registers. This circuit is the physical basis of a memory and a learned reflex: “See stove -> anticipate pain -> pull hand away.”

Examples of Hebbian Learning in Action:

Learning a Motor Skill: When you first learn to play a piano chord, the motor commands from your brain to your fingers are clumsy and uncoordinated. With practice, the precise sequence of neurons firing to activate each finger in the correct order and timing is repeated. These neurons “fire together” and “wire together,” creating a strong, efficient circuit. Eventually, playing the chord becomes an automatic, fluid motion, a literal part of your neural wiring.

Classical Conditioning: In Pavlov’s famous experiment, the sound of a bell (a neutral stimulus) initially did not cause a dog to salivate. The sight of food (an unconditioned stimulus) did. By repeatedly pairing the bell with the presentation of food, the neural pathways for “bell” and “salivation” fired together. Eventually, they wired together, to the point where the bell alone could trigger the salivation response.

Forming Memories: Remembering a face is a complex pattern of visual activation. The neurons representing the specific arrangement of eyes, nose, and mouth fire together when you see your friend. Hebbian strengthening ensures that this entire pattern becomes a strongly linked assembly. Activating just a part of the pattern (e.g., seeing their distinctive eyes) can now trigger the entire assembly, causing you to recall the whole face.

In essence, Hebb’s rule is the algorithm the brain uses to convert correlated experience into permanent structural and functional change. It is the scribe that writes experience into the living tissue of the mind.

Part 2: The Digital Apprentice – Large Language Models and the Statistical Bond

Large Language Models (LLMs) like GPT-4 operate on a principle that is, at a high conceptual level, strikingly analogous to Hebb’s rule. We can phrase it as: ”Words that appear together, wire together.”

While the biological mechanism is electrochemical, the AI’s mechanism is mathematical and statistical. The “wiring” happens not in a synapse, but in a multidimensional abstract space known as the embedding space.

The Basic Architecture of an LLM

At its core, a modern LLM is a neural network, specifically a Transformer model. Its purpose is to predict the next most plausible word in a sequence. It does this through a multi-step process:

1. Tokenization: The input text is broken down into smaller pieces (tokens), which can be words or parts of words.

2. Embedding: Each token is converted into a vector, a long list of numbers (e.g., 768 or 12288 dimensions). This vector is its “embedding.” Think of it as a unique numerical ID card that captures the meaning of the word. Crucially, words with similar meanings have similar vectors. The space where all these vectors live is the embedding space.

3. Processing through Transformer Layers: The model processes the sequence of vectors through multiple layers of “attention” and “feed-forward” networks. The attention mechanism is key: it allows the model to weigh the importance of every other word in the sequence when considering the next word. It asks, “Given all the words that have come before, which ones are most relevant for predicting what comes next?”

4. Output Prediction: The final layer converts the processed vector into a probability distribution over every word in the model’s vocabulary. The word with the highest probability is selected as the next token.

The Embedding Space: Where Words Wire Together

This is where our analogy becomes concrete. The embedding space is the LLM’s universe of meaning. During the model’s training on terabytes of text from the internet, books, and articles, it continuously adjusts the vectors for every token.

How does it adjust them? By observing which words appear together.

When the model repeatedly sees the word “king” in close proximity to “queen,” “castle,” “royal,” and “reign,” it performs a mathematical version of Hebbian strengthening. It doesn’t strengthen a synapse, instead, it adjusts the numerical values in the vectors for these words so that they are “closer” to each other in the multidimensional embedding space.

If “hot” is frequently followed by or associated with “stove,” “summer,” and “spicy,” their vectors will be pulled closer together.

The relationship “man” is to “woman” as “king” is to “queen” emerges as a consistent mathematical vector operation: `vector(”king”) – vector(”man”) + vector(”woman”) ≈ vector(”queen”)`.

This process is driven by a loss function. Every time the model predicts the next word incorrectly, it calculates the error and propagates it backward through the network, tweaking the embeddings and internal weights to make a better prediction next time. After billions of such adjustments across trillions of data points, the model organizes its embedding space into a stunningly accurate map of human language and conceptual relationships.

Predicting the Next Word: A Statistical Reflex

For an LLM, generating text is a process of triggering chains of associated vectors. When you give it a prompt like “The chef seasoned the soup with a pinch of…”, the model activates the vector for “pinch of.”

It then looks at all the words whose vectors are “wired” to this context. “Salt” and “pepper” will have very high probabilities because they have been observed together with “pinch of” countless times in its training data.

“Cyanide” will have a vanishingly low probability because that association is statistically absent (or extremely rare) in its training corpus.

The model’s choice is a statistical reflex. It is the direct, automated output of the strengthened connections formed during training. It is not “thinking”, it is executing a mathematical reflection of the patterns it has absorbed. It is the digital equivalent of the child pulling its hand from the hot stove, a pre-wired response to a familiar stimulus.

Part 3: The Emergence of a Basic Intelligence

The connection between Hebb’s rule and LLM training reveals how a fundamental principle of association can give rise to behaviors we recognize as intelligent, even if they fall short of human consciousness.

Animal-Level Intelligence and the LLM

The intelligence of many animals is not based on abstract reasoning or symbolic manipulation, but on associative learning. A dog learns the sound of a leash means a walk. A rat learns which path in a maze leads to food. This intelligence is built from Hebbian circuits: connecting stimuli with outcomes and actions.

LLMs exhibit a form of intelligence that is, in many ways, comparable.

1. Reflexive Response to the Environment (Prompt): For an LLM, the “environment” is the prompt. The prompt is the stimulus that triggers a cascade of associated concepts within its embedded wiring. Its response is a reflex, honed by its training data. Just as a deer is wired to freeze at the sound of a snapping twig, an LLM is wired to respond to “What is the capital of France?” with “Paris.” This is not recall from a database, it is the activation of a strongly reinforced pathway in its network.

2. The Emergence of Memory: In the brain, a memory is a stable, strongly connected cell assembly formed by Hebbian plasticity. In an LLM, a “memory” is not stored in a specific location but is a particular configuration of weights and embeddings that recreates a pattern from its training data. When you ask an LLM about the plot of Hamlet, it is not retrieving a text file. It is reconstructing the narrative based on the immensely strong wiring between the vectors for “Hamlet,” “Denmark,” “ghost,” “revenge,” “Ophelia,” and so on. The memory is an emergent property of the associative network.

3. Pattern Completion: A key feature of intelligence is the ability to complete a pattern from a partial cue. The smell of a certain perfume can trigger the vivid memory of a person. In an LLM, this is its primary function. The prompt “The sky is…” is a partial pattern. The model, based on the incredibly strong wiring of “sky” and “is” to “blue,” completes the pattern. It can complete more complex patterns, like stories or code, because its “neurons” (mathematical functions) for related concepts are wired together.

The Limits of the Analogy

It is crucial to acknowledge the profound differences. The analogy discussed should be interpreted as a heuristic analogy, not a mechanistic equivalence. Biological brains are:

Embodied and Situated: They learn through continuous, multi-sensory interaction with a physical world. An LLM’s world is pure text.

Driven by Drives: Brains have fundamental drives (hunger, thirst, safety) that guide learning and behavior. LLMs have no internal goals beyond minimizing a prediction error.

Dynamic and Pruning: The brain constantly prunes unused connections and reconfigures itself. An LLM’s “wiring” is static after training.

Conclusion: Two Paths to a Similar Destination

Hebb’s rule and the statistical training of LLMs are two distinct implementations of a powerful idea: that intelligence, at its most foundational level, can arise from the strengthening of connections between co-occurring elements. In the brain, these elements are neurons that fire in response to the world. In the LLM, they are word-vectors that appear in the landscape of human language.

The phrase “neurons that fire together, wire together” finds its digital echo in “words that appear together, wire together.” Both processes create a network where stimuli can trigger predictable, learned responses. Both allow for the emergence of memory as a stable state within this network. And both give rise to a functional, if limited, intelligence, one that allows an animal to navigate its physical environment, and the other that allows a model to navigate the universe of human knowledge and expression. In understanding this deep connection, we not only demystify the “magic” of LLMs but also gain a greater appreciation for the elegant, algorithmic principles that may underpin all forms of thought.

Valentino Zocca

The Threads of Thought: From Hebb’s Fire to the AI’s Web of Words

Leave a Reply Cancel reply