L19) The P-chain

The Brain That Predicts by Producing

Understanding, Producing, Learning: A Single Loop?

When you listen to a sentence, your brain doesn’t just “analyze.” According to an increasingly supported hypothesis, it silently produces the probable continuation — and it’s the discrepancy between its prediction and what it actually hears that drives learning. Three activities once thought separate would form but one.

Where Does This Article Fit In?

Previous articles have dissected the generation-recognition asymmetry as a formal fact — a property of grammars, measurable in complexity classes (L17) and in surprisal bits (L15). This article changes level: it looks at the psycholinguistic antecedent of this asymmetry. For the triad we are studying — producing, recognizing, inferring — has already been conceived as a unified whole by cognitive science, under the name of the P-chain. It is the link between formal language theory and the brain.

Why Is This Important?

The research article that this series popularizes claims originality: having brought together previously dispersed asymmetries into a single dimensional framework. But it would be dishonest to claim that no one had connected production and comprehension. Psycholinguistics did — at another level of analysis. The P-chain is precisely the antecedent that must be cited, understood, and from which we must distinguish ourselves.

Understanding the P-chain also means seeing why two dimensions of asymmetry — information (D4) and temporality (D6) — which seem independent on formal paper, might be one and the same thing in the brain.

The Idea in One Sentence

The P-chain proposes that language comprehension relies on implicit prediction performed by the production system, and that the error between this prediction and the actual input is the driver of learning — thus linking production, comprehension, and acquisition in a single causal chain.

Let’s Explain Step by Step

1. Three Activities, Long Studied Separately

Historically, psycholinguistics has treated as distinct domains:

production (how an intention is transformed into speech),
comprehension (how meaning is retrieved from heard speech),
acquisition (how a child learns their language).

Three literatures, three sets of models, few bridges. This is exactly the cognitive reflection of the formal asymmetry of L13: generating, recognizing, inferring, each in its own corner.

2. The P-chain: One Chain, Not Three Boxes

Dell & Chang (2014) propose to reverse the perspective. Their framework, the P-chain (“P-chain,” where P refers to prediction, production, processing), asserts that these three activities are the links of a single chain:

TikZ diagram

Figure 1 — The P-chain loop. Comprehension engages the production system to predict the continuation; the discrepancy with the actual input constitutes a prediction error; this error adjusts the model, which is learning; a better model predicts better. Production is not at the end of the chain — it is at the heart of comprehension.

3. “Prediction is Production”

The most surprising link is the first: to understand, the brain would use its production apparatus. Martin, Branzi & Bar (2018) formulate this in their title — “Prediction is Production” — and support it experimentally: when the production system is occupied by a secondary verbal task, prediction ability during comprehension drops. The system that speaks is the same one that anticipates what will be heard.

Gambi & Pickering (2017) make it a modeling principle: to understand is to simulate the other’s production. The listener does not passively receive; they actively re-generate, in advance, what the speaker is saying.

Decryption. The idea is not that you internally pronounce every word. It’s that the language planning mechanisms — those that, in production, choose the next word — are requisitioned, in comprehension, to guess the next word. Production runs “dry,” in prediction mode.

4. Surprisal: The Measurable Trace of Prediction

How to measure this silent prediction? By its failure. When the heard word is expected, processing is fluid; when it surprises, it costs. This is exactly the surprisal introduced in L15:

$S(w_i) = -\log_2 P(w_i \mid w_1, \dots, w_{i-1})$

The surprisal of word $$w_i$$ measures the improbability of the word given the preceding context. Hale (2001) makes it a model of processing difficulty: the more surprising a word, the longer it takes to integrate. Levy (2008) refines this into expectation-based comprehension: difficulty is the cost of reallocating probability mass among competing hypotheses when the word arrives. Stolcke (1995) had provided the machinery: a probabilistic Earley parser that calculates, at each position, the prefix probability.

Surprisal is therefore the observable signature of the P-chain: if the brain predicts, then the violation of its prediction must have a cost — and this is measured (reading time, N400 brain waves). This is the bridge between the cognitive hypothesis and the temporal dimension (D6) of our asymmetry: the generator never surprises itself ( $$S = 0$$ ), the receiver experiences the surprisal of the input ( $$S > 0$$ ).

5. Prediction Error Drives Learning

The last link connects everything to inference (D5, the subject of L13 and soon a dedicated article). In the P-chain, “prediction error drives learning”: each discrepancy between prediction and reality is an error signal that adjusts internal parameters. This is the same principle as learning by surprise minimization, and it aligns with the idea that to understand is to compress — to find the model that makes the data as unsurprising as possible.

In other words: a child acquiring their language is not performing an operation alien to comprehension. They are engaged in comprehension whose errors are significant enough to reconfigure the grammar. Inference is comprehension pushed to its limit — when the grammar itself is still unknown.

6. What the P-chain Says (and Doesn’t Say) About Our Framework

Here, for the sake of rigor, we must distinguish between levels of analysis.

Our framework is formal: it analyzes generation, recognition, and inference as distinct computational objects, with distinct complexity classes. At this level, information asymmetry (D4: what each agent knows in total) and temporal asymmetry (D6: how uncertainty evolves token by token) are independent: a “batch” mode parser (which receives the entire string at once) suffers from D4 but not D6.

The P-chain is cognitive: it describes brain mechanisms. And at this level, it questions this independence. If understanding is predicting by producing, then the static information gap (D4) might just be the aggregate of small incremental surprises (D6) accumulated over time. A single machinery, observed at two scales.

The two interpretations do not contradict each other: they operate at different levels. Our contribution is not to discover the production-comprehension link — the P-chain did that — but to situate it within the formal framework of languages, where it had not been articulated. The cognitive question of whether D4 and D6 are a single mechanism remains open.

7. In Music: The Anticipating Listener

Music offers the purest ground for the P-chain. Listening to a tonal melody means constantly anticipating the next note — and feeling a precise tension when it deviates. Models of melodic expectation (like IDyOM, which calculates note-by-note surprisal from a statistical model of style) are literally musical P-chains: they predict by mentally “producing” the continuation, and measure surprise.

The improvising musician lives the other end of the chain: they hear while playing. Their production anticipates their own listening. And the apprentice adjusts their model of style with each poorly anticipated phrase — inference in action. The same loop, from composer to listener to student.

This is also why a system like I2, which clearly separates production (PROD mode) and analysis (ANAL mode), captures the formal shape of asymmetry but not its cognitive loop: it does not predict by producing. The reversibility of grammar (L16) is a necessary, but not sufficient, condition to close the loop.

Key Takeaways

The P-chain (Dell & Chang 2014) unifies production, comprehension, and acquisition into a single causal chain.
Central hypothesis, prediction-by-production: to understand, the brain predicts the continuation by engaging its production system (Martin et al. 2018).
Surprisal (Hale 2001, Levy 2008) is the measurable trace of this prediction: its cost when expectation is violated.
“Prediction error drives learning”: the prediction/reality discrepancy drives learning — inference is comprehension pushed to its limit.
Levels of analysis: at the formal level, D4 (information) and D6 (temporality) are independent; at the cognitive level, the P-chain suggests they might be one. Our contribution is to situate this link within the formal framework, not to discover it.
In music, the anticipating listener and the musician who “hears while playing” are living P-chains.

To Go Further

The P-chain Framework and Prediction

Dell, G.S. & Chang, F. (2014). “The P-chain: relating sentence production and its disorders to comprehension and acquisition.” Phil. Trans. R. Soc. B 369(1634), 20120394. DOI:10.1098/rstb.2012.0394
Martin, C.D., Branzi, F.M. & Bar, M. (2018). “Prediction is Production: The missing link between language production and comprehension.” Scientific Reports 8, 1079. DOI:10.1038/s41598-018-19499-4
Gambi, C. & Pickering, M.J. (2017). “Models Linking Production and Comprehension.” The Handbook of Psycholinguistics, 157-181. DOI:10.1002/9781118829516.ch7
Gastaldon, S. et al. (2024). “Predictive language processing: integrating comprehension and production.” Frontiers in Psychology 15, 1369177. DOI:10.3389/fpsyg.2024.1369177
Chater, N. & Manning, C.D. (2006). “Probabilistic models of language processing and acquisition.” Trends in Cognitive Sciences 10(7), 335-344. DOI:10.1016/j.tics.2006.05.006

Surprisal

Hale, J. (2001). “A Probabilistic Earley Parser as a Psycholinguistic Model.” NAACL 2001 — difficulty proportional to surprisal.
Levy, R. (2008). “Expectation-Based Syntactic Comprehension.” Cognition 106(3), 1126-1177 — probability reallocation.
Stolcke, A. (1995). “An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities.” Computational Linguistics 21(2) — the machinery of prefix probabilities.

The Popularized Research Article

Peyrichou, R. (2026). The Generation-Recognition Asymmetry… §1.2 and §4.6 situate the P-chain in relation to the formal framework. Preprint arXiv:2603.10139 — https://arxiv.org/abs/2603.10139

In the Corpus

L13 — The 6-Dimensional Asymmetry
L15 — Surprisal and Other Formulas (D6)
L16 — Reversibility: Necessary but Not Sufficient for the Loop

Glossary

P-chain: Dell & Chang (2014) framework linking production, comprehension, and acquisition in a causal chain via prediction.
Prediction-by-production: Hypothesis according to which comprehension predicts the continuation by engaging the production system.
Surprisal: $-\log_2$ of a word’s probability given its context; measures improbability, thus processing difficulty.
Prediction error: Discrepancy between what the system predicted and the actual input; signal that drives learning.
N400: Brain wave (event-related potential) whose amplitude increases with semantic unexpectedness — neural correlate of surprisal.
Level of analysis: Plane at which a phenomenon is described (formal/computational vs. cognitive/mechanistic); two levels can diverge without contradicting each other.
Melodic expectation: Anticipation, by the listener, of the next note; modeled by musical surprisal (e.g., IDyOM).

Links in the Series

L13 — Generate or Recognize — the asymmetry for which the P-chain is the cognitive antecedent
L15 — The Asymmetry Formulas — where surprisal (D6) comes from
L18 — The Sign Reversal — the other major contribution of the article
M6 — Hierarchical Structure in Music — structural expectations

Prerequisites: L13, L15
Reading time: 10 min
Tags: #P-chain #psycholinguistics #surprisal #prediction #production #musical-cognition

← Back to index