L15) The Formulas of Asymmetry
From T_parse to Surprisal
In L13, we saw that generation and recognition are “operationally asymmetric”. But we remained at the level of intuition: O(n) versus O(n³), “generation is easier”. This article goes further: it dissects the formulas, one by one, to show that asymmetry is not a slogan — it’s a mathematical result across six independent dimensions.
Where does this article fit in?
This article is the technical companion to L13. Where L13 presented asymmetry in accessible terms — puzzles, concerts, composers — this article presents the formulas that underpin it. It is an article in the L series (formal languages), with formulas at the forefront but always explained: no equation appears without stating what it means and why it matters.
If you are comfortable with the notations from L1 and L6, you will be on familiar ground.
Why is this important?
The intuition “generating is easy, analyzing is difficult” is both true and misleading. True: there is a real complexity gap. Misleading: the true asymmetry is not a simple difference in difficulty — it is structural. The parser faces imposed constraints (the input is given, non-negotiable); the generator faces chosen constraints (it may or may not impose them on itself).
The formulas we will see don’t just say “one is slower than the other”. They show that asymmetry manifests across six independent axes — complexity, ambiguity, direction, information, inference, temporality — and that it is intrinsic to formal languages, not an artifact of a particular implementation.
The Idea in One Sentence
Generation-recognition asymmetry is formalized across six mathematically independent dimensions — from the complexity gap (CFG recognition in $\mathcal{O}(n^\omega)$, $\omega \approx 2.37$) to the generator’s surprisal $S = 0$ versus the parser’s $S > 0$ — without, however, reducing to “generating is easy, analyzing is difficult”.
Let’s Explain Step by Step
1. The Complexity Gap: T_gen vs T_parse
Unconstrained generation is linear regardless of the grammar class: $T_{\text{gen}} = \mathcal{O}(n)$. Recognition, however, depends dramatically on the Chomsky level:
| Chomsky Level | $T_{\text{gen}}$ | $T_{\text{parse}}$ | Gap |
|---|---|---|---|
| Type 3 (regular) | $\mathcal{O}(n)$ | $\mathcal{O}(n)$ | none |
| Type 2 (context-free) | $\mathcal{O}(n)$ | $\mathcal{O}(n^\omega)$, $\omega\approx2.37$ (CYK : $O(n^3)$) | super-linear |
| Type 2+ (TAG/MCFG) | $\mathcal{O}(n)$ | $\mathcal{O}(n^6)$ | power 6 |
| Type 1 (context-sensitive) | $\mathcal{O}(n)$ | PSPACE-complete | exponential |
| Type 0 (r.e.) | $\mathcal{O}(n)$ | $\to \infty$ | infinite |
Asymmetry grows with expressive power. But this one-dimensional view is misleading: in reality, each operation depends on two orthogonal axes — the grammar class and the task specification — with inverse sensitivity. This is differential coupling: generation is sensitive to the task (free vs. constrained), recognition is sensitive to the grammar (Type 3 vs. Type 0).
→ For the two complete complexity matrices and differential coupling, see L17.
Clarification (v2.1). The table above summarizes generation as $\mathcal{O}(n)$, but this only applies to free or terminating generation. Each operation actually breaks down into three sub-problems (on the recognition side: membership, one tree, all trees; on the generation side: free, terminating, constrained). And for generation under semantic constraint, the asymmetry changes sign — it can become exponential (Kay 1996) or proven NP-complete (Brew 1992), thus more difficult than parsing. See L18.
2. Gen and Parse as Functions
Generation and parsing can be formalized as two mathematical functions:
$\text{gen}: G \times D \to \Sigma^*$
Generation takes a grammar $G$ and a derivation sequence $d$ (the series of choices: which rule to apply at each step), and produces a string $w$. For a fixed $d$, the output $w$ is unique: gen is a function (single-valued). Beware of the false friend: this is not injectivity — two distinct derivations can produce the same string. The point is not injectivity, but that the input — (grammar + derivation) — fixes the output.
$\text{parse}: G \times \Sigma^* \to 2^{\mathcal{T}}$
Parsing takes a grammar $G$ and a string $w$, and returns a set of derivation trees $\{t_1, t_2, \ldots, t_k\}$. The notation $2^{\mathcal{T}}$ means “a subset of the set of all possible trees”. Three cases:
- $k = 0$ : the string does not belong to the language
- $k = 1$ : unambiguous parsing (a single tree)
- $k > 1$ : ambiguous parsing (multiple valid trees for the same string)
Parse is a relation (multi-valued): the same string can correspond to multiple structures. The root cause is not an intrinsic property of recognition, but the differential informativeness of the input: the parser only receives (grammar + string), which is strictly less informative than (grammar + derivation) — its input underdetermines the tree.
In summary:
- gen : (grammar + derivation) → one sentence — input that fixes the output (single-valued function)
- parse : (grammar + sentence) → multiple trees — input that underdetermines the output (multi-valued relation)
3. Catalan Numbers: Why Ambiguity Explodes
When $k > 1$ (the string is ambiguous), how many derivation trees can there be? The answer is given by Catalan numbers:
$C_n = \frac{1}{n+1}\binom{2n}{n} \sim \frac{4^n}{n^{3/2}\sqrt{\pi}}$
$C_n$ is the number of ways to associate $n$ elements in pairs — which corresponds to the number of binary trees with $n$ nodes. In practice:
| $n$ (constituents) | $C_n$ (possible trees) |
|---|---|
| 2 | 2 |
| 3 | 5 |
| 4 | 14 |
| 5 | 42 |
| 10 | 16 796 |
| 15 | 9 694 845 |
For 3 prepositional phrases (“I saw the man with the telescope on the hill”), there are already 5 possible derivation trees. For 10, over 16,000. The growth is exponential: $\sim 4^n$.
Shared derivation forests (Billot & Lang, 1989) allow for compact representation of an exponential number of trees in $\mathcal{O}(n^3)$ space — but the number of underlying trees remains exponentially large. Non-determinism is compressed, not eliminated.
4. Parikh and Intrinsic Ambiguity
One might hope that by rewriting the grammar more intelligently, ambiguity could always be eliminated. Parikh’s theorem (1966) shatters this hope:
Theorem (Parikh, 1966). There exist inherently ambiguous context-free languages — languages for which no unambiguous grammar can be written.
The classic example:
$L = \{a^n b^n c^m d^m \mid n, m \geq 1\} \cup \{a^n b^m c^m d^n \mid n, m \geq 1\}$
This language is the union of two context-free languages. Any string of the form $a^n b^n c^n d^n$ (when $n = m$) belongs to both sub-languages — and this dual membership cannot be eliminated by any choice of grammar.
Brabrand, Giegerich, and Moller (2007) add an even stronger result: it is undecidable to determine if a grammar is ambiguous. One cannot even generally check if a given grammar exhibits the problem.
5. The Birman-Ullman Paradox
Birman and Ullman (1973) observed a paradox:
“Transforming a Chomsky grammar into a parsing mechanism is considerably difficult; but transforming a parsing scheme into a generation mechanism is much easier.”
At first glance, this inverts the asymmetry. Isn’t generation more difficult?
The resolution lies in the distinction between transforming from one mode to another and operating within a mode.
A parsing scheme already contains the structural analysis — it has resolved ambiguities, explored the derivation space, encoded the recognition strategy. Converting it into a generator is an “informational downgrade“: one discards the analytical structure and retains only the generative path. It’s easy because one throws away information.
Converting a generative grammar into a parser is an “informational upgrade“: one must add the analytical machinery (lookahead, disambiguation, error recovery) absent from the grammar. It’s difficult because one creates information.
Birman-Ullman’s observation thus confirms the asymmetry: parsing contains more information than generation. The asymmetry of transformation is the mirror image of the asymmetry of operation.
6. Surprisal: Formalized Temporal Asymmetry
Perhaps the most surprising (pun intended) dimension is the temporal one. Hale’s (2001) surprisal formalizes it.
The surprisal of the $i$-th token is:
$S(w_i) = -\log_2 P(w_i \mid w_1, w_2, \ldots, w_{i-1})$
In English: “how many bits of information does this token provide, given what we have already seen?”. The more unexpected a token is, the higher its surprisal.
For a deterministic generator, the system knows which token it will produce:
$P_{\text{gen}}(w_i \mid w_1, \ldots, w_{i-1}) = 1 \quad \Longrightarrow \quad S_{\text{gen}}(w_i) = 0$
The generator is never surprised. It creates the future — it knows it in advance. Surprisal = zero, always.
For an incremental parser, the system has seen $w_1, \ldots, w_i$ but not what follows. It maintains competing hypotheses:
$P_{\text{parse}}(w_i \mid w_1, \ldots, w_{i-1}) < 1 \quad \Longrightarrow \quad S_{\text{parse}}(w_i) > 0$
The parser is surprised — with each token, it must re-evaluate its hypotheses. Surprisal > 0, always.
Surprisal quantifies temporal asymmetry: exactly zero for the generator, strictly positive for the parser. This is the mathematical formalization of the intuition: the composer knows what they are going to write; the listener discovers it as it unfolds.
Sidebar: The Signal Processing Analogue
In digital signal processing, a causal system at time $t$ depends only on past inputs ($\leq t$). An anti-causal system can access future inputs. The generator is causal (it creates the future); a parser without lookahead is also causal (it only sees the past), but a parser with lookahead is partially anti-causal — it looks into the future. Bidirectional encoders like BERT access the entire sequence: a pure analytical capability.

7. The Summary Table
Here are the six dimensions brought together in a single table:
| Dim | Name | Generation | Recognition | Key Reference |
|---|---|---|---|---|
| D1 | Complexity | $\mathcal{O}(n)$ (gen-example) | $\mathcal{O}(n^\omega)$ to undecidable | Valiant 1975, Aho et al. 1986 |
| D2 | Ambiguity | Single-valued function — input (grammar + derivation) that fixes the output | Multi-valued relation, $C_n \sim 4^n$ trees — input (grammar + string) that underdetermines | Parikh 1966, Brabrand 2007 |
| D3 | Direction | Always top-down | Free choice (LL, LR, …) | Shieber 1988 |
| D4 | Information | Complete knowledge | Inference from surface | Shannon 1948 |
| D5 | Inference | gen < recog < infer | — | Gold 1967 |
| D6 | Temporality | Causal: $S = 0$ | Predictive: $S > 0$ | Hale 2001 |
Key Takeaways
- The complexity gap on the recognition side grows with expressive power: from none (Type 3, $\Theta(n)$) to undecidable (Type 0). But this is only one axis: differential coupling (recognition ← grammar class, generation ← task specification) and sign reversal (constrained generation = NP-complete) prevent reducing asymmetry to “generating is easy” — see L17 and L18.
- Generation is a function (single-valued — not “injective”: its input fixes the output); parsing is a relation (one sentence → $k$ trees) because its input is less informative. Catalan numbers $C_n \sim 4^n$ quantify the growth of ambiguity.
- Ambiguity is intrinsic to certain languages (Parikh 1966) and undecidable in general (Brabrand 2007). It cannot be eliminated or even detected.
- The Birman-Ullman paradox confirms asymmetry: transforming a parser into a generator is easy (informational downgrade), the inverse is difficult (informational upgrade).
- Surprisal formalizes temporal asymmetry: $S = 0$ for the generator (it creates the future), $S > 0$ for the parser (it predicts under uncertainty).
- The six dimensions are independent: each captures a distinct aspect of the generation-recognition divide.
Further Reading
Complexity
- Aho, A.V., Sethi, R. & Ullman, J.D. (1986). Compilers: Principles, Techniques, and Tools. Addison-Wesley — complexities of CYK, Earley, LL, LR.
- Hopcroft, J.E. & Ullman, J.D. (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley — fundamental results.
Ambiguity and Multi-valuedness
- Parikh, R.J. (1966). “On Context-Free Languages.” Journal of the ACM, 13(4), 570–581 — intrinsic ambiguity of CFLs.
- Brabrand, C., Giegerich, R. & Møller, A. (2007). “Analyzing Ambiguity of Context-Free Grammars.” Science of Computer Programming — undecidability of ambiguity.
- Billot, S. & Lang, B. (1989). “The Structure of Shared Forests in Ambiguous Parsing.” ACL — shared derivation forests.
- Jurafsky, D. & Martin, J.H. (2023). Speech and Language Processing (3rd ed.) — Catalan numbers and ambiguity.
Birman-Ullman
- Birman, A. & Ullman, J.D. (1973). “Parsing Algorithms with Backtrack.” Information and Control, 23(1), 1–34 — the original paradox.
Surprisal and Temporality
- Hale, J. (2001). “A Probabilistic Earley Parser as a Psycholinguistic Model.” NAACL — surprisal as a model of processing difficulty.
- Levy, R. (2008). “Expectation-Based Syntactic Comprehension.” Cognition, 106(3), 1126–1177 — expectation-based comprehension.
- Stolcke, A. (1995). “An Efficient Probabilistic Context-Free Parsing Algorithm.” Computational Linguistics — probabilistic Earley parser, prefix probabilities.
Grammatical Inference
- Gold, E.M. (1967). “Language Identification in the Limit.” Information and Control, 10(5), 447–474 — the fundamental impossibility result.
- de la Higuera, C. (2010). Grammatical Inference: Learning Automata and Grammars. Cambridge University Press — comprehensive reference.
In the Corpus
- L1 — The hierarchy levels and their automata
- L6 — Operational Semantics: another mathematical formalism in the L series
- L9 — Type 2+: TAG, CCG, MCFG
- L13 — Overview of asymmetry (popularized version)
- L14 — Parsing Direction: LL, LR, and the Dragon Book
- L16 — Why bidirectionality was not transferred
- L17 — The class × task matrix and differential coupling
- L18 — Sign Reversal: When Generation Outperforms Parsing
- L19 — The P-chain: surprisal as a trace of prediction (D6)
- L20 — Inference: the third direction (D5)
Glossary
- Intrinsic Ambiguity : A property of a language (not a grammar) such that any grammar generating it is ambiguous. Discovered by Parikh (1966).
- Birman-Ullman (paradox of) : The observation that transforming a parser into a generator is easy, while transforming a grammar into a parser is difficult — seemingly the inverse of asymmetry, but in reality its confirmation.
- Causal (system) : In signal processing, a system whose output at time $t$ depends only on inputs at times $\leq t$. The generator is causal; a parser without lookahead is also.
- Catalan (numbers) : The sequence $C_n = \frac{1}{n+1}\binom{2n}{n}$ which counts the number of binary trees with $n$ nodes. Exponential growth $\sim 4^n / n^{3/2}\sqrt{\pi}$.
- Shared derivation forest : A data structure that compactly represents (in $\mathcal{O}(n^3)$) an exponentially large number of derivation trees. Non-determinism is compressed, not eliminated.
- Function (gen) vs. relation (parse) : Generation is a function (single-valued): its input — (grammar + derivation) — fixes the output. This is not about injectivity (two derivations can yield the same string). Parsing is a relation (one sentence → multiple trees) because its input — (grammar + string) — is less informative and underdetermines the tree.
- PSPACE-complete : A class of problems solvable in polynomial space but for which no polynomial-time algorithm is known. Parsing context-sensitive grammars is PSPACE-complete.
- Surprisal : $S(w_i) = -\log_2 P(w_i | w_1 \ldots w_{i-1})$. Measures the information contributed by a token in context. Zero for the generator, positive for the parser.
- $T_{\text{gen}}$, $T_{\text{parse}}$ : Time complexity functions for generation and recognition respectively, parameterized by Chomsky level $k$ and length $n$.
Prerequisites : L1, L6, L13
Reading time : 15 min
Tags : #formulas #asymmetry #complexity #surprisal #Catalan #Parikh #Birman-Ullman