L13) Generate or Recognize
The Duality of Formal Grammars
A grammar is not only used to produce sentences. It can also analyze them. But these two uses — generation and recognition — are neither symmetrical nor equivalent. This asymmetry, rarely discussed, is nevertheless at the heart of music.
Where does this article fit in?
In L1, we discovered the hierarchy of formal languages. In L2, we learned how to build grammars that produce sentences through derivation. But we have always worked in a single direction: from meaning to sentences.
This article poses a question rarely formulated: what happens when we reverse the direction? When, instead of producing a sentence from a grammar, we receive a sentence and try to understand if it belongs to the language — and what structure the grammar assigns to it?
This is the difference between composing and analyzing, between speaking and listening, between encoding and decoding. And this difference is far from trivial.
Why is this important?
A famous terminological trap
The most cited book in formal music theory is titled A Generative Theory of Tonal Music (GTTM, Lerdahl & Jackendoff, 1983 — over 5,000 citations). Yet, GTTM does not generate any music. It is an analytical theory: given an existing piece of music, it provides preference rules to assign a hierarchical structure to it (grouping, metrical structure, temporal reduction).
The term “generative” is used in the original Chomskyan sense — a set of formally explicit rules — and not in the sense of “production.” Lerdahl himself clarified that GTTM models the listener (reception), not the composer (emission) [Lerdahl2009].
This confusion illustrates a deeper problem: we use the same word — “grammar” — for two fundamentally different activities.
Two sides of the same coin
Every formal grammar defines a language — a set of valid strings. But it can be used in two ways:
| Direction | Technical Name | Question Asked | Musical Analogy |
| ——————– | ————————— | “What sentences can this grammar produce?” | The composer writes a piece |
| → Generation | Production / Derivation | | |
| ← Recognition | Parsing / Analysis | “Does this sentence belong to the language? What structure corresponds to it?” | The musicologist analyzes a musical production |
Mathematically, both uses define the same language: the set of strings is identical. But operationally, the two directions are radically different.
The idea in one sentence
A formal grammar is a bidirectional object: it can generate sentences or analyze them, but the two directions differ in complexity, determinism, and available information.
Let’s explain step by step
1. Computational asymmetry
The first difference is algorithmic in nature. For a Context-Free Grammar (CFG — L2):
- Generating a sentence is fast: one starts from the axiom (start symbol), applies rewriting rules, and each step is a local choice. The complexity is linear in the length of the derivation.
- Recognizing a sentence is more costly: given a string of
nsymbols, determining if it belongs to the language and reconstructing its structure requires an algorithm like CYK (Cocke-Younger-Kasami) or Earley, in O(n³) time — that is, proportional to the cube of the input length.
For more expressive formalisms, the gap widens dramatically. Context-sensitive grammars (Type 1, L1) have a recognition problem that is PSPACE-complete (Polynomial Space complete — a complexity class for which no efficient algorithm is generally known) [Berwick1982].
In other words: producing is generally easier than analyzing.
Aside: The Puzzle Analogy
Building a 1,000-piece puzzle from the complete image is relatively simple: you determine how you’re going to cut it, so you know which piece goes where. But receiving a mixed-up puzzle and figuring out how to assemble it — in what order, by what strategy — is a completely different problem. The grammar is the reference image; generation builds the puzzle; parsing disassembles it to understand its structure.
2. Determinism asymmetry
The second difference concerns ambiguity:
- In generation, one can choose one derivation at each step (possibly randomly, as in probabilistic PCFGs — cf. B1). The result is a single sentence.
- In recognition, a sentence can have several valid analyses. This is structural ambiguity: the same sequence of words can correspond to multiple derivation trees (L4).
In linguistics, the sentence “I saw the man with the telescope” has two analyses: I use the telescope to see, or the man has the telescope. In music, the same harmonic progression can be analyzed in several ways by GTTM (M6).
Generation is a one-to-one process (one intention → one sentence). Recognition is a one-to-many process (one sentence → several possible structures).
3. Information asymmetry
The generator has access to everything: communicative intent, stylistic constraints, pragmatic context. The analyzer only has access to the observable signal — the sound surface, the sequence of symbols.
In music, the composer knows why they chose a particular note. The listener must infer this intention from the result. This is the fundamental problem of all musical analysis: creative constraints are available to the generator but must be reconstructed by the analyzer.
Umberto Eco, in The Open Work (1962), formalized this asymmetry: a single production (the work) admits multiple interpretations (analyses). The generation function is many-to-one (multiple intentions converge to one work), while the analysis function is one-to-many (one work diverges into multiple interpretations).
4. Temporal asymmetry: offline vs real-time
The three previous asymmetries implicitly assume that we have all the necessary time — that we are working offline. But when the constraint of real-time is added, the asymmetry widens.
- Generating in real-time is relatively natural: one produces symbol by symbol, as it happens. A composer improvising, a live coder modifying their code during a concert — each decision is local and immediate.
- Analyzing in real-time is much more difficult: the signal is received progressively, without knowing what comes next. The analyzer must maintain multiple hypotheses about the current structure — this is the problem of incremental parsing [Hale2001].
In music, this asymmetry is omnipresent. A listener at a concert hears notes one by one. They cannot “go back” to re-analyze a passage. Their analysis is constrained by the flow — they must interpret in real-time, with partial information.
Aside: The Concert vs. the Score
A musicologist analyzing a Beethoven sonata has the complete score in front of them — they can go back and forth, compare distant passages, revise their interpretations. A listener at a concert only has the sound flow: each second erases the previous one (except in memory). Both perform “analysis,” but under radically different temporal constraints. The former works offline (all information available at once); the latter works in streaming (partial, sequential, irreversible information).
The IDyOM model formalizes exactly this situation: it predicts the next note from the notes already heard, without ever seeing the future. It is an incremental decoder in Shannon’s sense — and this is precisely what makes it more difficult than generation.
5. The bridge: analysis by synthesis
The analysis-by-synthesis paradigm, born in speech perception research in the 1960s, proposes a bridge between the two directions: to recognize a signal, one generates internal candidates and compares them to the received signal [HalleStevens1962].
Analysis contains synthesis as a subprocess. The receiver is not a simple passive filter: it possesses an internal generative model that produces hypotheses, tested against the input.
In music, this corresponds to the expert listener’s experience: when you hear a ii-V-I (the subdominant → dominant → tonic progression, for example Dm7 → G7 → Cmaj7 in C major), you anticipate the resolution because your internal model has generated this expectation. Models like IDyOM (Information Dynamics of Music — a musical prediction model based on corpus statistics) formalize exactly this predictive process [Pearce2018].
6. Reversible grammars
Some formalisms are designed to work in both directions with the same grammar:
- GF (Grammatical Framework, Ranta, 2019) — a multilingual formalism where grammars are explicitly reversible, allowing both parsing and generation [Ranta2019]
- Unification grammars — used in bidirectional machine translation [vanNoord1990]
- Steedman (1984 → 2014) — the same CCG (Combinatory Categorial Grammar — L9) grammar for jazz progressions was used first to generate chord sequences [Steedman1984], then 30 years later to parse them (analyze them automatically) [GranrothWilding2014]. This is a remarkable example of the same musical grammar used in both directions — with very different computational challenges.
But reversibility is not automatic. Strzalkowski (1993) showed that a grammar designed for parsing requires transformations (an “inversion”) to be used for generation. The very fact that an inversion process is necessary proves the operational asymmetry [Strzalkowski1993].
7. In music: three paradigms
The generation/recognition duality structures the entire field of computational musicology:
The analytical paradigm — grammar as a listening tool:
- GTTM (Lerdahl & Jackendoff, 1983): listener’s model, not the composer’s (M6)
- Computational Schenkerian analysis: parsing an existing piece to extract its deep structure [MavromatissBrown2004]
- IDyOM: statistical model of musical prediction — the listener as an inference machine
The generative paradigm — grammar as a composition tool:
- Holtzman (1981): pioneering use of grammars for automatic composition
- Steedman (1984): CCG grammar for producing jazz progressions
- Cope / EMI (1989): hybrid system that analyzes a corpus then generates new pieces — both directions in the same system
The bidirectional paradigm — grammar as a bridge:
- The Bol Processor (Bel & Kippen, 1992) is the most explicit case. BP3 has three modes of grammar use:
– PROD (production, modus ponens): the grammar generates musical sequences
– ANAL (analysis, modus tollens): the grammar tests if a sequence belongs to the language and, if so, identifies its structure
– TEMP (template): the grammar produces sequences respecting a given template
Bel and Kippen’s complete cycle for tabla perfectly illustrates the duality: a musician produces sequences of bols (mnemonic syllables of the tabla — B2), the system analyzes these sequences to infer a grammar (via QAVAID, Question Answer Validated Analytical Inference Device — an interactive grammatical inference system), then this grammar can re-produce new sequences, which the musician evaluates. It’s an analysis → grammar → generation → evaluation → analysis loop [KippenBel1989].
8. The third direction: grammatical inference
Beyond the generation/recognition duality, there is an even more fundamental third direction: grammatical inference (grammar induction) — given examples of sentences, finding the grammar itself [delaHiguera2010].
This is the inverse problem of formal language theory:
- Generation: grammar → sentences
- Recognition: grammar + sentence → yes/no + structure
- Inference: sentences → grammar
In music, this is exactly what a musicologist does when listening to a corpus and trying to extract the “rules” — stylistic conventions, harmonic constraints, rhythmic patterns. It is also what grammatical inference systems like QAVAID for tabla or the work of Cruz-Alcázar & Vidal for musical style identification do [CruzAlcazar2008].
This direction will be explored in detail in M10.
Shannon’s model: encoder ≠ decoder
Claude Shannon’s communication model (1948) formalizes asymmetry at the most fundamental level:
Source → Encoder → Channel → Decoder → Destination
The encoder (generation) and decoder (recognition) are not symmetrical:
- The encoder transforms a message into a signal — it is a choice among possibilities
- The decoder reconstructs the message from the noisy signal — it is an inference of the most probable
The channel introduces noise — uncertainty, loss. In music: air is a noisy channel, the ear is an imperfect decoder, the score is a lossy encoder (it does not capture timbre, touch, intention). Each translation between musical representations is an encoder → channel → decoder passage, with its own losses and ambiguities.
Key Takeaways
- Every grammar has two uses: to generate (produce sentences) and to recognize (analyze existing sentences).
- These two directions are mathematically equivalent (same language) but operationally asymmetrical: different complexity, different determinism, different available information.
- Generating is generally easier than analyzing: O(n) vs O(n³) for CFGs, and the gap widens for more expressive formalisms.
- In real-time, the asymmetry further increases: generating in streaming is natural (one produces as it happens), analyzing in streaming is difficult (one has not yet seen what comes next — incremental parsing).
- Analysis by synthesis shows that recognition uses generation as an internal subprocess.
- In music, GTTM is analytical (despite its title), Steedman is bidirectional, and BP3 is explicitly reversible (PROD/ANAL/TEMP modes).
- Grammatical inference is the third direction: finding the grammar from examples — the fundamental problem of computational musicology.
Further Reading
Theoretical Foundations
- Berwick, R. (1984). “Strong Generative Capacity, Weak Generative Capacity, and Modern Linguistic Theories.” Computational Linguistics.
- Strzalkowski, T. (1993). Reversible Grammar in Natural Language Processing. Springer — foundational work on reversible grammars.
- de la Higuera, C. (2010). Grammatical Inference: Learning Automata and Grammars. Cambridge University Press — reference on grammatical inference.
Music
- Lerdahl, F. & Jackendoff, R. (1983). A Generative Theory of Tonal Music. MIT Press — the most influential analytical model.
- Steedman, M. (1984). “A Generative Grammar for Jazz Chord Sequences.” Music Perception — jazz grammar later used for parsing (Granroth-Wilding & Steedman, 2014).
- Bel, B. & Kippen, J. (1992). “Modelling Music with Grammars: Formal Language Representation in the Bol Processor” — the quintessential bidirectional system.
- Cruz-Alcázar, P.P. & Vidal, E. (2008). “Two Grammatical Inference Applications in Music Processing.” Applied Artificial Intelligence.
Temporal Asymmetry and Incremental Parsing
- Hale, J. (2001). “A Probabilistic Earley Parser as a Psycholinguistic Model.” NAACL — foundational model of probabilistic incremental parsing.
Analysis-by-synthesis
- Halle, M. & Stevens, K.N. (1962). “Speech Recognition: A Model and a Program for Research.” IRE Transactions on Information Theory.
- Poeppel, D. & Monahan, P.J. (2011). “Feedforward and Feedback in Speech Perception: Revisiting Analysis by Synthesis.” Language and Cognitive Processes.
In the Corpus
- L1 — Levels of complexity
- L2 — Grammars that produce
- L9 — Mildly context-sensitive formalisms (TAG, CCG)
- M6 — GTTM: the analytical paradigm applied
- M10 — Grammatical inference in music (upcoming)
- B6 — BP3 Wildcards: pattern matching in grammar
Glossary
- Analysis-by-synthesis: A recognition paradigm where the receiver generates internal candidates and compares them to the received signal. Analysis contains synthesis as a subprocess.
- ANAL (mode): Analytical mode of the Bol Processor — the grammar tests if a sequence belongs to the language (modus tollens).
- Structural ambiguity: Property of a sentence that admits multiple derivation trees for the same grammar. Frequent in recognition, absent in generation.
- Encoder / Decoder: In Shannon’s model, the encoder transforms a message into a signal (generation), the decoder reconstructs the message from the signal (recognition). They are not symmetrical.
- Reversible grammar: A grammar designed to be used in both directions (generation and parsing) without transformation. Examples: GF, unification grammars.
- IDyOM (Information Dynamics of Music): A statistical model of musical prediction based on corpus entropy. Models the listener as an inference machine.
- Grammatical inference (grammar induction): The “third direction” — finding the grammar from examples of sentences. The inverse problem of formal language theory.
- Modus ponens / modus tollens: In logic, modus ponens goes from rule to consequence (if A then B, A, therefore B — generative direction). Modus tollens goes from observation to premise (if A then B, B observed, therefore A plausible — analytical direction).
- NLG / NLU (Natural Language Generation / Understanding): The two subfields of NLP (Natural Language Processing) that embody the generation/recognition duality.
- Incremental parsing: Syntactic analysis that processes input symbol by symbol, as it is received, without waiting for the end of the sentence. Necessary for real-time analysis. More difficult than offline parsing because the analyzer does not have access to the rest of the signal.
- Parsing (syntactic analysis): The process that, given a sentence and a grammar, determines if the sentence belongs to the language and reconstructs its structure (derivation tree).
- PROD (mode): Production mode of the Bol Processor — the grammar generates musical sequences (modus ponens).
- QAVAID (Question Answer Validated Analytical Inference Device): Bel & Kippen’s interactive grammatical inference system for tabla. The name also means “grammar” in Arabic/Urdu.
Prerequisites: L1, L2
Reading time: 14 min
Tags: #grammars #duality #generation #recognition #parsing #analysis-by-synthesis #BP3