L14) The Direction of Parsing

LL, LR and the Dragon Book

When a grammar generates a sentence, it always descends from intention to surface — from the start symbol to the words. But when it parses a sentence, it can choose to ascend, descend, or do both at once. This directional freedom, known to all compiler designers, had never been identified as a fundamental dimension of the asymmetry between generation and recognition.

Where Does This Article Fit In?

In L13, we explored the duality between generation and recognition — the two fundamental uses of any formal grammar. We identified several dimensions of this asymmetry: computational complexity, determinism, available information, temporality.

But one dimension was missing: direction. This is the subject of this article. And it is perhaps the most surprising dimension, because it hides in the most widely used reference book in computer science — the Dragon Book — without ever being named.


Why Is This Important?

Imagine an architect designing a building (generation) and an inspector examining an existing building (analysis). The architect always starts from the intention — the functional program, aesthetic constraints, standards — and descends towards concrete details: structure, materials, finishes. They cannot start with door handles and “ascend” towards the building’s concept.

The inspector, however, has a choice. They can start from the roof and descend. They can start from the foundations and ascend. They can start from the emergency exits and reason in both directions. The inspection strategy is a parameter — a design choice.

This asymmetry is exactly what exists between generation and parsing in formal language theory. And it has profound consequences.


The Idea in One Sentence

Generation is inherently top-down (from the axiom to the surface), but parsing possesses a degree of directional freedom — the choice between top-down, bottom-up, or hybrid strategies — which is absent from generation.


Let’s Explain Step by Step

1. Generation Always Descends

Any derivation in a phrase-structure grammar (L2) proceeds from the start symbol $S$ to the terminal symbols:

$S \Rightarrow \alpha_1 \Rightarrow \alpha_2 \Rightarrow \cdots \Rightarrow w$

This process is inherently top-down: it goes from the most abstract level ($S$, the intention: “I want a musical phrase”) to the most concrete ($w$, the surface: the actual notes).

As Chomsky formulated it: a grammar generates “top-down, not left-to-right” (Chomsky, 1957). When a grammar produces do re mi fa, the left-to-right temporal order is a by-product of the hierarchical decomposition:

$S \to AB,\quad A \to \text{do ré},\quad B \to \text{mi fa}$

The temporal sequence do re mi fa emerges from the vertical structure, not the other way around.

Sidebar: Apparent Counterexample

Kay (1996) proposed a chart generation algorithm where syntactic assembly can proceed bottom-up. But even in this case, the semantic direction remains top-down — what drives the process is always intention → surface. The mechanics can assemble the pieces in any order; the logic always goes from abstract to concrete.

2. Parsing Has a Choice

For parsing, direction is a design parameter. Given a grammar $G$ and a string $w$, we look for derivation trees $t$ such that the leaves of $t$ form $w$. But we can search for these trees in several ways:

Strategy Direction How it works
LL (recursive descent) Top-down ↓ Predict which non-terminal to expand, then check if terminals match
LR (shift-reduce) Bottom-up ↑ Stack read terminals, then reduce to non-terminals when a pattern is recognized
Earley (1970) Hybrid ↕ Top-down prediction + bottom-up completion, combined
CYK Bottom-up ↑ Build a table using dynamic programming, from short substrings to long ones
Left-corner (1970) Hybrid ↕ Start bottom-up (the first symbol), then predict top-down

Five strategies. Five directions. And these are only the main ones — others exist (GLR, chart parsing, Packrat…).

The crucial point: this choice exists only for the parser. The generator has no analogous decision. It descends, period.

3. The Dragon Book: Unintentional Proof

The Dragon Book (Aho, Sethi & Ullman, 1986) is the reference book in compiler design. Its official name is Compilers: Principles, Techniques, and Tools; its nickname comes from the dragon on the cover.

This book provides an unintentional but striking case study of directional asymmetry:

  • Parsing: hundreds of pages. Entire chapters on LL, LR, parsing tables, shift-reduce conflicts, error recovery, operator grammars…
  • Code generation: a relatively contained discussion. Linear traversal of the AST (L4), tree rewriting, pattern matching.

Why this disproportion? Because code generation is a top-down traversal without a choice of direction — one traverses the AST from top to bottom, and the machine code is output progressively. Parsing, however, requires a whole arsenal of strategies, each with its strengths, weaknesses, and compromises.

The Dragon Book never names this directional asymmetry. It demonstrates it through its own imbalance.

4. LL vs LR: The Striking Observation

Within parsing, the LL-versus-LR distinction is particularly revealing.

LL (Left-to-right, Leftmost derivation):

  • The parser imitates the generative process: it is top-down, it predicts which production to apply
  • It looks one or a few symbols ahead (the lookahead: the next unread input symbols) to decide

LR (Left-to-right, Rightmost derivation):

  • The parser inverts the generative process: it is bottom-up, it stacks read symbols and reduces them when it recognizes the right-hand side of a rule
  • It defers its decisions: it accumulates evidence before committing

Crucial point: LR is strictly more powerful than LL. For any $k$ (number of lookahead symbols), the class of LR($k$) languages strictly includes the class LL($k$) (Knuth, 1965).

In other words: the parsing strategy most different from generation is the most powerful. The one that imitates generation is the weakest.

This is not an accident. LR parsing can defer its decisions — it waits to have enough context before committing. This ability to postpone is something only the parser needs. The generator commits immediately: it chooses a production and derives. The parser, however, can accumulate evidence. And the further it moves away from the generative (top-down) strategy, the more power it gains.

Sidebar: The Detective Analogy

A novelist (LL generator) writes the story from beginning to end, each narrative choice stemming from the previous one. A detective (LR parser) gathers clues (the terminals), stacks them, and only concludes once a pattern emerges. The detective is more powerful precisely because they do not follow the author’s logic — they reconstruct in reverse.

5. Shieber: The Inversion That Doesn’t Work

Stuart Shieber (1988) attempted to design a unified architecture for parsing and generation — a single engine capable of doing both with the same grammar. What he discovered concretely proves directional asymmetry.

In parsing, the algorithm is driven by the surface string: tokens are consumed from left to right, and the chart (the parsing table) is indexed by positions in the string — “from character 3 to character 7, a noun phrase was recognized.”

In generation, the string does not yet exist — it is what we are trying to produce. It is therefore impossible to index the chart by string positions. Shieber had to restructure the algorithm so that it would be driven by the semantic structure: generation follows semantic heads (nodes of meaning), not syntactic heads (words). This is semantic head-driven generation (Shieber et al., 1990).

This restructuring is not a simple parameter change — it is a reorganization of the data flow. Parsing goes from surface to meaning: $w \to t$. Generation goes from meaning to surface: $\phi \to w$. What serves as input for one does not yet exist for the other. The same declarative information (grammatical rules) requires two distinct procedural interpretations.

Shieber’s observation confirms that directional asymmetry is structural: one cannot “run parsing in reverse” to obtain generation. Even within a unified framework, direction imposes irreducible architectural choices.

6. Knuth and Attributes: Top-down vs Bottom-up

Knuth’s attribute grammars (1968) formalize the same asymmetry in the semantic domain (not just syntactic).

In these grammars, the nodes of the derivation tree carry attributes — computed information. These attributes flow in two directions:

  • Inherited attributes ($\downarrow$): they descend from parent to child. Example: the expected type of an expression, imposed by the context.
  • Synthesized attributes ($\uparrow$): they ascend from child to parent. Example: the computed value of an expression, propagated up from the leaves.

Generation is essentially a flow of inherited attributes (intention descends). Analysis is a bidirectional flow (information ascends and descends). Grune and Jacobs observe that attribute grammars “should rather be classified as recognition systems” — the bidirectional flow of attributes is an analytical property.


Key Takeaways

  • Generation is inherently top-down: from the axiom to the terminals, from intention to surface. This is a constant.
  • Parsing possesses a degree of directional freedom: LL (top-down), LR (bottom-up), Earley (hybrid), CYK (bottom-up), left-corner (hybrid). This is a parameter.
  • The Dragon Book unintentionally illustrates this asymmetry: hundreds of pages on parsing strategies, generation taken for granted.
  • LR is more powerful than LL: the strategy most different from generation is the most performant, because it can defer its decisions.
  • This directional asymmetry is known to all compiler designers but had never been named as an independent dimension of the generation-recognition asymmetry.
  • In music, it’s the same: composing (descending from intention to notes) has only one direction; analyzing an existing piece can proceed in multiple ways (harmonic, motivic, Schenkerian, statistical…).

Further Reading

Fundamental References

  • Aho, A.V., Sethi, R. & Ullman, J.D. (1986). Compilers: Principles, Techniques, and Tools (the Dragon Book). Addison-Wesley — the most comprehensive treatment of LL and LR strategies.
  • Chomsky, N. (1957). Syntactic Structures. Mouton — the original formulation of the top-down direction of generation.
  • Knuth, D.E. (1965). “On the Translation of Languages from Left to Right.” Information and Control — proof that LR($k$) ⊃ LL($k$).
  • Knuth, D.E. (1968). “Semantics of Context-Free Languages.” Mathematical Systems Theory — attribute grammars (inherited ↓ and synthesized ↑).

Unified Architecture

  • Shieber, S. (1988). “A Uniform Architecture for Parsing and Generation.” COLING — first attempt at a unified parsing/generation architecture.
  • Shieber, S., van Noord, G., Pereira, F.C.N. & Moore, R.C. (1990). “Semantic-Head-Driven Generation.” Computational Linguistics, 16(1), 30–42 — semantic-head-driven generation, proof that parsing does not trivially invert.
  • Kay, M. (1996). “Chart Generation.” ACL — chart generation (bottom-up assembly, but top-down semantic logic).

Parsing Strategies

  • Earley, J. (1970). “An Efficient Context-Free Parsing Algorithm.” Communications of the ACM — the hybrid top-down-bottom-up algorithm.
  • Rosenkrantz, D.J. & Lewis, P.M. (1970). “Deterministic Left Corner Parsing.” IEEE Symposium on Switching and Automata Theory — the left-corner strategy.
  • Grune, D. & Jacobs, C. (2008). Parsing Techniques: A Practical Guide. Springer — comprehensive overview of parsing techniques.

In the Corpus

  • L1 — The Levels of the Hierarchy
  • L2 — How Grammars Produce Sentences
  • L4 — What is an AST
  • L13 — Overview of Asymmetry (6 Dimensions)
  • L15 — Mathematical Formalizations of Asymmetry
  • L16 — Why Bidirectionality Was Not Transferred

Glossary

  • Bottom-up: A parsing strategy that starts from the terminal symbols (the leaves of the tree) and ascends towards the axiom (the root). Examples: LR, CYK.
  • Inherited attribute ($\downarrow$): Information that descends from parent to child in an attribute grammar. Top-down flow.
  • Synthesized attribute ($\uparrow$): Information that ascends from child to parent in an attribute grammar. Bottom-up flow.
  • Chart (parsing table): A tabular data structure used by parsing algorithms (Earley, CYK, chart parsing) to store intermediate results. In parsing, indexed by positions in the input string. In generation, this indexing is impossible (the string does not yet exist), hence the need to be driven by semantic heads.
  • CYK (Cocke-Younger-Kasami): A bottom-up parsing algorithm using dynamic programming. Works in $\mathcal{O}(n^3)$ for any CFG.
  • Top-down: A strategy that starts from the axiom (the root) and descends towards the terminals (the leaves). This is the natural direction of generation. In parsing, this is the LL strategy.
  • Dragon Book: Nickname for Compilers: Principles, Techniques, and Tools (Aho, Sethi & Ullman, 1986), the reference book in compiler design.
  • Earley: A hybrid (top-down-bottom-up) parsing algorithm capable of processing any CFG in $\mathcal{O}(n^3)$.
  • Left-corner: A hybrid parsing strategy that starts bottom-up (the first symbol) then continues top-down (prediction).
  • LL (Left-to-right, Leftmost derivation): Predictive top-down parsing. Imitates the direction of generation. Less powerful than LR.
  • Lookahead: The number of lookahead symbols a parser consults to decide which rule to apply. Noted as $k$ in LL($k$) and LR($k$).
  • LR (Left-to-right, Rightmost derivation): Bottom-up shift-reduce parsing. Inverts the direction of generation. Strictly more powerful than LL (Knuth, 1965).

Prerequisites: L1, L2, L13
Reading time: 12 min
Tags: #parsing #direction #LL #LR #dragon-book #asymmetry #compilers


Back to index