L15) The Formulas of Asymmetry

From T_parse to Surprisal

In L13, we saw that generation and recognition are “operationally asymmetric”. But we remained at the level of intuition: O(n) versus O(n³), “generation is easier”. This article goes further: it dissects the formulas, one by one, to show that asymmetry is not a slogan — it’s a mathematical result across six independent dimensions.

Where does this article fit in?

This article is the technical companion to L13. Where L13 presented asymmetry in accessible terms — puzzles, concerts, composers — this article presents the formulas that underpin it. It is an article in the L series (formal languages), with formulas at the forefront but always explained: no equation appears without stating what it means and why it matters.

If you are comfortable with the notations from L1 and L6, you will be on familiar ground.

Why is this important?

The intuition “generating is easy, analyzing is difficult” is both true and misleading. True: there is a real complexity gap. Misleading: the true asymmetry is not a simple difference in difficulty — it is structural. The parser faces imposed constraints (the input is given, non-negotiable); the generator faces chosen constraints (it may or may not impose them on itself).

The formulas we will see don’t just say “one is slower than the other”. They show that asymmetry manifests across six independent axes — complexity, ambiguity, direction, information, inference, temporality — and that it is intrinsic to formal languages, not an artifact of a particular implementation.

The Idea in One Sentence

Generation-recognition asymmetry is formalized across six mathematically independent dimensions — from the complexity gap (CFG recognition in $\mathcal{O}(n^\omega)$ , $\omega \approx 2.37$ ) to the generator’s surprisal $$S = 0$$ versus the parser’s $$S > 0$$ — without, however, reducing to “generating is easy, analyzing is difficult”.

Let’s Explain Step by Step

1. The Complexity Gap: T_gen vs T_parse

Unconstrained generation is linear regardless of the grammar class: $T_{\text{gen}} = \mathcal{O}(n)$ . Recognition, however, depends dramatically on the Chomsky level:

Chomsky Level	$T_{\text{gen}}$	$T_{\text{parse}}$	Gap
Type 3 (regular)	$\mathcal{O}(n)$	$\mathcal{O}(n)$	none
Type 2 (context-free)	$\mathcal{O}(n)$	$\mathcal{O}(n^\omega)$ , $\omega\approx2.37$ (CYK : $$O(n^3)$$ )	super-linear
Type 2+ (TAG/MCFG)	$\mathcal{O}(n)$	$\mathcal{O}(n^6)$	power 6
Type 1 (context-sensitive)	$\mathcal{O}(n)$	PSPACE-complete	exponential
Type 0 (r.e.)	$\mathcal{O}(n)$	$\to \infty$	infinite

Asymmetry grows with expressive power. But this one-dimensional view is misleading: in reality, each operation depends on two orthogonal axes — the grammar class and the task specification — with inverse sensitivity. This is differential coupling: generation is sensitive to the task (free vs. constrained), recognition is sensitive to the grammar (Type 3 vs. Type 0).

→ For the two complete complexity matrices and differential coupling, see L17.

Clarification (v2.1). The table above summarizes generation as $\mathcal{O}(n)$ , but this only applies to free or terminating generation. Each operation actually breaks down into three sub-problems (on the recognition side: membership, one tree, all trees; on the generation side: free, terminating, constrained). And for generation under semantic constraint, the asymmetry changes sign — it can become exponential (Kay 1996) or proven NP-complete (Brew 1992), thus more difficult than parsing. See L18.

2. Gen and Parse as Functions

Generation and parsing can be formalized as two mathematical functions:

$\text{gen}: G \times D \to \Sigma^*$

Generation takes a grammar $$G$$ and a derivation sequence $$d$$ (the series of choices: which rule to apply at each step), and produces a string $$w$$ . For a fixed $$d$$ , the output $$w$$ is unique: gen is a function (single-valued). Beware of the false friend: this is not injectivity — two distinct derivations can produce the same string. The point is not injectivity, but that the input — (grammar + derivation) — fixes the output.

$\text{parse}: G \times \Sigma^* \to 2^{\mathcal{T}}$

Parsing takes a grammar $$G$$ and a string $$w$$ , and returns a set of derivation trees $\{t_1, t_2, \ldots, t_k\}$ . The notation $2^{\mathcal{T}}$ means “a subset of the set of all possible trees”. Three cases:

$$k = 0$$ : the string does not belong to the language
$$k = 1$$ : unambiguous parsing (a single tree)
$$k > 1$$ : ambiguous parsing (multiple valid trees for the same string)

Parse is a relation (multi-valued): the same string can correspond to multiple structures. The root cause is not an intrinsic property of recognition, but the differential informativeness of the input: the parser only receives (grammar + string), which is strictly less informative than (grammar + derivation) — its input underdetermines the tree.

In summary:

gen : (grammar + derivation) → one sentence — input that fixes the output (single-valued function)
parse : (grammar + sentence) → multiple trees — input that underdetermines the output (multi-valued relation)

3. Catalan Numbers: Why Ambiguity Explodes

When $$k > 1$$ (the string is ambiguous), how many derivation trees can there be? The answer is given by Catalan numbers:

$C_n = \frac{1}{n+1}\binom{2n}{n} \sim \frac{4^n}{n^{3/2}\sqrt{\pi}}$

$$C_n$$ is the number of ways to associate $$n$$ elements in pairs — which corresponds to the number of binary trees with $$n$$ nodes. In practice:

$$n$$ (constituents)	$$C_n$$ (possible trees)
2	2
3	5
4	14
5	42
10	16 796
15	9 694 845

For 3 prepositional phrases (“I saw the man with the telescope on the hill”), there are already 5 possible derivation trees. For 10, over 16,000. The growth is exponential: $\sim 4^n$ .

Shared derivation forests (Billot & Lang, 1989) allow for compact representation of an exponential number of trees in $\mathcal{O}(n^3)$ space — but the number of underlying trees remains exponentially large. Non-determinism is compressed, not eliminated.

4. Parikh and Intrinsic Ambiguity

One might hope that by rewriting the grammar more intelligently, ambiguity could always be eliminated. Parikh’s theorem (1966) shatters this hope:

Theorem (Parikh, 1966). There exist inherently ambiguous context-free languages — languages for which no unambiguous grammar can be written.

The classic example:

$L = \{a^n b^n c^m d^m \mid n, m \geq 1\} \cup \{a^n b^m c^m d^n \mid n, m \geq 1\}$

This language is the union of two context-free languages. Any string of the form $$a^n b^n c^n d^n$$ (when $$n = m$$ ) belongs to both sub-languages — and this dual membership cannot be eliminated by any choice of grammar.

Brabrand, Giegerich, and Moller (2007) add an even stronger result: it is undecidable to determine if a grammar is ambiguous. One cannot even generally check if a given grammar exhibits the problem.

5. The Birman-Ullman Paradox

Birman and Ullman (1973) observed a paradox:

“Transforming a Chomsky grammar into a parsing mechanism is considerably difficult; but transforming a parsing scheme into a generation mechanism is much easier.”

At first glance, this inverts the asymmetry. Isn’t generation more difficult?

The resolution lies in the distinction between transforming from one mode to another and operating within a mode.

A parsing scheme already contains the structural analysis — it has resolved ambiguities, explored the derivation space, encoded the recognition strategy. Converting it into a generator is an “informational downgrade“: one discards the analytical structure and retains only the generative path. It’s easy because one throws away information.

Converting a generative grammar into a parser is an “informational upgrade“: one must add the analytical machinery (lookahead, disambiguation, error recovery) absent from the grammar. It’s difficult because one creates information.

Birman-Ullman’s observation thus confirms the asymmetry: parsing contains more information than generation. The asymmetry of transformation is the mirror image of the asymmetry of operation.

6. Surprisal: Formalized Temporal Asymmetry

Perhaps the most surprising (pun intended) dimension is the temporal one. Hale’s (2001) surprisal formalizes it.

The surprisal of the $$i$$ -th token is:

$S(w_i) = -\log_2 P(w_i \mid w_1, w_2, \ldots, w_{i-1})$

In English: “how many bits of information does this token provide, given what we have already seen?”. The more unexpected a token is, the higher its surprisal.

For a deterministic generator, the system knows which token it will produce:

$P_{\text{gen}}(w_i \mid w_1, \ldots, w_{i-1}) = 1 \quad \Longrightarrow \quad S_{\text{gen}}(w_i) = 0$

The generator is never surprised. It creates the future — it knows it in advance. Surprisal = zero, always.

For an incremental parser, the system has seen $w_1, \ldots, w_i$ but not what follows. It maintains competing hypotheses:

$P_{\text{parse}}(w_i \mid w_1, \ldots, w_{i-1}) < 1 \quad \Longrightarrow \quad S_{\text{parse}}(w_i) > 0$

The parser is surprised — with each token, it must re-evaluate its hypotheses. Surprisal > 0, always.

Surprisal quantifies temporal asymmetry: exactly zero for the generator, strictly positive for the parser. This is the mathematical formalization of the intuition: the composer knows what they are going to write; the listener discovers it as it unfolds.

Sidebar: The Signal Processing Analogue

In digital signal processing, a causal system at time $$t$$ depends only on past inputs ( $\leq t$ ). An anti-causal system can access future inputs. The generator is causal (it creates the future); a parser without lookahead is also causal (it only sees the past), but a parser with lookahead is partially anti-causal — it looks into the future. Bidirectional encoders like BERT access the entire sequence: a pure analytical capability.

7. The Summary Table

Here are the six dimensions brought together in a single table:

Dim	Name	Generation	Recognition	Key Reference
D1	Complexity	$\mathcal{O}(n)$ (gen-example)	$\mathcal{O}(n^\omega)$ to undecidable	Valiant 1975, Aho et al. 1986
D2	Ambiguity	Single-valued function — input (grammar + derivation) that fixes the output	Multi-valued relation, $C_n \sim 4^n$ trees — input (grammar + string) that underdetermines	Parikh 1966, Brabrand 2007
D3	Direction	Always top-down	Free choice (LL, LR, …)	Shieber 1988
D4	Information	Complete knowledge	Inference from surface	Shannon 1948
D5	Inference	gen < recog < infer	—	Gold 1967
D6	Temporality	Causal: $$S = 0$$	Predictive: $$S > 0$$	Hale 2001

Key Takeaways

The complexity gap on the recognition side grows with expressive power: from none (Type 3, $\Theta(n)$ ) to undecidable (Type 0). But this is only one axis: differential coupling (recognition ← grammar class, generation ← task specification) and sign reversal (constrained generation = NP-complete) prevent reducing asymmetry to “generating is easy” — see L17 and L18.
Generation is a function (single-valued — not “injective”: its input fixes the output); parsing is a relation (one sentence → $$k$$ trees) because its input is less informative. Catalan numbers $C_n \sim 4^n$ quantify the growth of ambiguity.
Ambiguity is intrinsic to certain languages (Parikh 1966) and undecidable in general (Brabrand 2007). It cannot be eliminated or even detected.
The Birman-Ullman paradox confirms asymmetry: transforming a parser into a generator is easy (informational downgrade), the inverse is difficult (informational upgrade).
Surprisal formalizes temporal asymmetry: $$S = 0$$ for the generator (it creates the future), $$S > 0$$ for the parser (it predicts under uncertainty).
The six dimensions are independent: each captures a distinct aspect of the generation-recognition divide.

Glossary

Intrinsic Ambiguity : A property of a language (not a grammar) such that any grammar generating it is ambiguous. Discovered by Parikh (1966).
Birman-Ullman (paradox of) : The observation that transforming a parser into a generator is easy, while transforming a grammar into a parser is difficult — seemingly the inverse of asymmetry, but in reality its confirmation.
Causal (system) : In signal processing, a system whose output at time $$t$$ depends only on inputs at times $\leq t$ . The generator is causal; a parser without lookahead is also.
Catalan (numbers) : The sequence $C_n = \frac{1}{n+1}\binom{2n}{n}$ which counts the number of binary trees with $$n$$ nodes. Exponential growth $\sim 4^n / n^{3/2}\sqrt{\pi}$ .
Shared derivation forest : A data structure that compactly represents (in $\mathcal{O}(n^3)$ ) an exponentially large number of derivation trees. Non-determinism is compressed, not eliminated.
Function (gen) vs. relation (parse) : Generation is a function (single-valued): its input — (grammar + derivation) — fixes the output. This is not about injectivity (two derivations can yield the same string). Parsing is a relation (one sentence → multiple trees) because its input — (grammar + string) — is less informative and underdetermines the tree.
PSPACE-complete : A class of problems solvable in polynomial space but for which no polynomial-time algorithm is known. Parsing context-sensitive grammars is PSPACE-complete.
Surprisal : $S(w_i) = -\log_2 P(w_i | w_1 \ldots w_{i-1})$ . Measures the information contributed by a token in context. Zero for the generator, positive for the parser.
$T_{\text{gen}}$ , $T_{\text{parse}}$ : Time complexity functions for generation and recognition respectively, parameterized by Chomsky level $$k$$ and length $$n$$ .

Prerequisites : L1, L6, L13
Reading time : 15 min
Tags : #formulas #asymmetry #complexity #surprisal #Catalan #Parikh #Birman-Ullman

← Back to index

L15) The Formulas of Asymmetry

From T_parse to Surprisal

Where does this article fit in?

Why is this important?

The Idea in One Sentence

Let’s Explain Step by Step

1. The Complexity Gap: T_gen vs T_parse

2. Gen and Parse as Functions

3. Catalan Numbers: Why Ambiguity Explodes

4. Parikh and Intrinsic Ambiguity

5. The Birman-Ullman Paradox

6. Surprisal: Formalized Temporal Asymmetry

7. The Summary Table

Key Takeaways

Further Reading

Complexity

Ambiguity and Multi-valuedness

Birman-Ullman

Surprisal and Temporality

Grammatical Inference

In the Corpus

Glossary