L13) Generating or Recognizing
The Duality of Formal Grammars
A grammar is not just for producing sentences. It can also analyze them. But these two uses — generation and recognition — are neither symmetrical nor equivalent. This asymmetry, rarely discussed, is nonetheless at the heart of music.
Where Does This Article Fit In?
In L1, we discovered the hierarchy of formal languages. In L2, we learned how to construct grammars that produce sentences through derivation. But we have always worked in a single direction: from meaning to sentences.
This article asks a question: what happens when we reverse the direction? When, instead of producing a sentence from a grammar, we receive a sentence and seek to understand if it belongs to the language — and what structure the grammar assigns to it?
It is the difference between composing and analyzing, between speaking and listening, between encoding and decoding. And this difference is far from trivial.
Why Is It Important?
A Famous Terminological Trap
The most cited book in formal music theory is titled A Generative Theory of Tonal Music (GTTM, Lerdahl & Jackendoff, 1983 — over 5,000 citations). Yet, GTTM does not generate any music. It is an analytical theory: given an existing piece of music, it provides preference rules to assign it a hierarchical structure (grouping, metric structure, time-span reduction).
The term “generative” is used in the original Chomskyan sense — a set of formally explicit rules — and not in the sense of “production.” Lerdahl himself clarified that GTTM models the listener (reception), not the composer (emission) [Lerdahl2009].
This confusion illustrates a deeper problem: we use the same word — “grammar” — for two fundamentally different activities.
Two Sides of the Same Coin
Every formal grammar defines a language — a set of valid strings. But it can be used in two ways:
| Direction | Technical Name | Question Asked | Musical Analogy |
|---|---|---|---|
| → Generation | Production / Derivation | “Which sentences can this grammar produce?” | The composer writes a piece |
| ← Recognition | Parsing / Analysis | “Does this sentence belong to the language? Which structure corresponds to it?” | The musicologist analyzes a musical production |
Mathematically, both uses define the same language: the set of strings is identical. But operationally, the two directions are radically different.
The Idea in One Sentence
A formal grammar is a bidirectional object: it can generate sentences or analyze them, but the two directions differ along at least six independent dimensions — complexity, ambiguity, direction, information, inference, and temporality. The asymmetry is structural: the parser is always constrained, the generator is not necessarily so.
Explaining Step by Step
1. Computational Asymmetry
The first difference is algorithmic in nature. For a context-free grammar (CFG — L2):
- Generating a sentence is fast: we start from the axiom (start symbol), apply rewriting rules, and each step is a local choice. The complexity is linear in the length of the derivation.
- Recognizing a sentence is more costly: given a string of $n$ symbols, determining if it belongs to the language and reconstructing its structure is done in $O(n^3)$ time with the CYK (Cocke-Younger-Kasami) or Earley algorithm. The known optimal theoretical bound is actually $O(n^\omega)$ with $\omega \approx 2.37$ (quasi-quadratic), via Valiant’s reduction (1975) to matrix multiplication — but it remains super-linear.
For more expressive formalisms, the gap widens dramatically. Context-sensitive grammars (Type 1, L1) have a recognition problem that is PSPACE-complete (Polynomial Space complete — a complexity class for which no efficient algorithm is generally known) [Kuroda1964; Savitch1970].
In other words: producing is generally easier than analyzing.
But beware: this formulation is misleading. Unconstrained generation — applying rules at random — is trivial for any formalism. The result is “grammatical” but useless. Real generation operates under constraints (a target meaning, a style, aesthetic rules), and constrained generation can be as difficult — or even strictly more difficult — than parsing (the sign reversal, demonstrated by Kay 1996 and Brew 1992; see L18). The true asymmetry is structural: parsing is always constrained (the input is given, non-negotiable), while generation may or may not be.
→ For the mathematical formalizations of this gap ($T_{\text{parse}}$ formula by Chomsky level, semi-log graph), see L15.
Sidebar: The Puzzle Analogy
Building a 1,000-piece puzzle from the complete image is relatively simple: you determine how you will cut it, so you know which piece goes where. But receiving a mixed-up puzzle and understanding how to assemble it — in what order, by what strategy — is a completely different problem. The grammar is the reference image; generation builds the puzzle; parsing disassembles it to understand its structure.