M4) Generative Grammars and Music

From Chomsky to GTTM

What if music, like language, obeyed hidden rules that our brain unconsciously applies? This idea revolutionized linguistics in 1957, then musicology in 1983.

Where Does This Article Fit?

This is where the two main series meet: the theory of formal languages (L1, L series) and musical representation (M3, M series). This article shows how Chomsky’s grammars were adapted to music, paving the way for the I2 (Bol Processor) and GTTM (A Generative Theory of Tonal Music, Lerdahl and Jackendoff’s generative theory of tonal music, 1983).

Why Is This Important?

When you listen to a melody, your brain does remarkable work: it groups notes into phrases, identifies tensions and resolutions, and anticipates what comes next. You do all of this without thinking, even without musical training. How is this possible?

The generative grammars hypothesis offers a fascinating answer: our brain would possess innate “rules” for structuring sounds, whether speech or music. This idea, born in linguistics with Noam Chomsky (American linguist, born in 1928), has profoundly influenced how we understand, analyze, and even generate music.

For anyone interested in formal musical notation (like BP3, the Bol Processor — see I2) or automatic analysis, understanding this intellectual lineage is essential. It explains why some formalisms work and others fail.

Quick Timeline

1957: Chomsky publishes Syntactic Structures, birth of generative grammars

1983: Lerdahl and Jackendoff publish GTTM (A Generative Theory of Tonal Music)

1990s: Bernard Bel develops BP3, applying these ideas to musical generation

The Idea in One Sentence

A generative grammar is a finite set of rules capable of producing an infinite set of valid structures, whether sentences or musical sequences.

Let’s Explain Step by Step

Example 1: French Sentences

Let’s start with language. In French, you can create an infinite number of correct sentences you’ve never heard before:

“Le chat violet de ma voisine mange des spaghettis sur le toit.”

This sentence is probably new to you, but you immediately know it’s grammatically correct. How? Because your brain applies implicit rules:

Phrase → Sujet + Verbe + Complément
Sujet → Article + Nom + (Complément du nom)
Complément du nom → "de" + Sujet

With these few rules, an infinite number of sentences can be generated. This is Chomsky’s fundamental intuition: a finite number of rules can generate infinite creativity.

Example 2: A Melody That “Works”

Now, mentally listen to these two sequences:

Sequence A: C – D – E – F – G – C (ascending, resolved)
Sequence B: C – F# – Bb – E – Db – A (random intervals)

Even without musical training, sequence A seems “logical” to you, while B seems chaotic. Why? Because A respects implicit rules of Western tonality:

Movement by conjunct degrees (neighboring notes)
Beginning and ending on the tonic (C)
Coherent direction (ascent then resolution)

These rules are exactly what a musical grammar attempts to formalize.

The Chomsky Revolution (1957)

The Context

Before Chomsky, linguistics was dominated by behaviorism: it was thought that language was learned solely through imitation and reinforcement. A child hears sentences, repeats them, and their errors are corrected.

The Poverty of the Stimulus Problem

Chomsky demonstrated that this explanation was insufficient. Consider:

Children produce sentences they have never heard
They make systematic errors (“je sontais” instead of “j’étais” – “I was-ed” instead of “I was”) that reveal the application of rules
The amount of language heard is insufficient to explain the acquired mastery

His conclusion: the human brain possesses an innate “universal grammar”, a set of principles that constrain possible languages.

Rewrite Rules

Chomsky formalized grammars with production rules:

S  → NP VP          (a sentence = noun phrase + verb phrase)
NP → Det N          (noun phrase = determiner + noun)
VP → V NP           (verb phrase = verb + noun phrase)
Det → "the" | "a"
N  → "cat" | "dog"
V  → "eats" | "sees"

These rules generate sentences like “the cat eats a dog” or “a dog sees the cat”. Simple, yet powerful.

Transfer to Music: What Works

Hierarchical Structure

The major discovery that transfers from linguistics to music is hierarchy. Just as a sentence breaks down into clauses, then phrases, then words, a musical piece breaks down into:

Movements
Sections
Phrases
Motifs
Notes

This tree-like structure allows us to understand how music is perceived at multiple scales simultaneously.

Why a tree and not a simple list?

Imagine “Frère Jacques” as a flat list: C, D, E, C, C, D, E, C…
We lose the crucial information that “C D E C” forms a group (the “Frère Jacques” motif).

With a tree:
            Song
           /       \
      Phrase 1    Phrase 2
      /    \       /    \
   Motif  Motif  Motif  Motif
    |       |      |      |
  C D   C D  E F  E F
  E C   E C  G    G

The tree captures inclusion relations: a motif belongs to a phrase, which belongs to the song. This is the structure our brain unconsciously builds.

Recursion

Recursion — the ability of a rule to apply to its own result — also exists in music:

In linguistics:

“The cat [that the dog [that Pierre saw] bit] is black.”

In music:

A theme contains a motif
A variation develops the theme (which contains the motif)
A sonata develops the variation (which develops the theme, which contains the motif)

Production Rules

BP3 (Bol Processor 3), developed by Bernard Bel in the 1990s, explicitly uses production rules to generate music:

S → _tempo(80) A B A
A → C D C
C → C D E | E D C
D → F G A G F
B → D C D

How to read this grammar?

S → _tempo(80) A B A: the starting symbol S rewrites with a tempo of 80 BPM (beats per minute), followed by three sections A, B, A

C → C D E | E D C: the | symbol indicates a choice (random or weighted) between two alternatives

Uppercase letters (S, A, B, C, D) are non-terminals (intermediate symbols), lowercase letters (C, D, E…) are terminals (the notes played)

This grammar can generate several different melodies depending on the choices made at each | (or). This is exactly Chomsky’s principle applied to music.

Transfer to Music: What Needs Adaptation

The differences between language and music are often presented as incompatibilities that would invalidate the grammatical approach in music. In reality, they are differences of degree — and recognizing them only helps to better calibrate the necessary formal tools.

Compositionality: Propositional vs. Perceptual

In linguistics, meaning composes: “big black cat” = meaning(big) + meaning(cat) + meaning(black). This is the principle of semantic compositionality (attributed to Frege, German logician, 1848-1925).

In music, composition also exists, but it operates on a different level:

A cadence V → I (dominant → tonic) composes tension + resolution = rest
A crescendo on a dissonance composes intensity + instability = anticipation
A transposed motif composes the original motif + a shift = development

The difference: linguistic compositionality is propositional (it constructs statements about the world, which can be true or false). Musical compositionality is perceptual and affective (it constructs experiences of tension, movement, resolution, surprise). It’s a shift in register, not an absence.

Functional Categories: Fixed vs. Contextual

In linguistics, words are classified into functional categories at the same level of abstraction: noun, verb, adjective, adverb, preposition… In music, comparable functional categories exist at the same level (that of the note):

Linguistic Categories (word level)	Musical Categories (note level)
Noun → can be subject	Tonic note → point of rest
Verb → carries the action	Passing note → movement between two poles
Adjective → qualifies	Appoggiatura (ornamental note that creates tension before resolving) → expressive ornament
Adverb → modifies	Neighbor tone (decorative neighboring note) → neighboring note

The main difference: in linguistics, “cat” is a noun in (almost) all contexts. In music, a D is “tonic” in D major but a “passing note” in C major. Musical categories are more contextual.

But even in linguistics, context plays a role: “marche” (walk) is a noun (“la marche” – the walk) or a verb (“il marche” – he walks) depending on the sentence. “Run” in English is a noun or a verb. So categories are not absolutely fixed either — the difference is, again, one of degree.

Beware of the trap: functional categories (all at the same level) should not be confused with hierarchical levels (note → motif → phrase → section). This confusion would be equivalent, in linguistics, to comparing “noun, verb, adjective” with “word, phrase, clause, sentence” — these are two distinct axes of analysis.

And the Question of Musical Meaning?

The signifier-signified relationship in music — the connection between the musical sign and what it “designates” — is a vast subject that goes beyond the scope of this article. Let’s simply note that the musical sign IS arbitrary in the sense of Saussure (Swiss linguist, 1857-1913): the same pitch is called “do” in solfège, “C” in Anglo-Saxon notation, and “sa” in Indian music — just as “arbre”, “tree”, and “Baum” designate the same concept. The parallel with linguistics is stronger here than often believed, and an in-depth study of the informational content of musical structures would probably reveal more similarities than differences with language.

What These Differences Imply for Musical Grammars

These two shifts — perceptual compositionality and contextual categories — do not invalidate the grammatical approach. They explain why musical grammars need adaptations compared to linguistic grammars:

Shift	Necessary Adaptation	Examples
Perceptual compositionality	Semantics of tension/resolution rather than truth	GTTM: prolongational reduction; BP3: weighted PCFGs (B1)
Contextual categories	Preference rules rather than rigid categories	GTTM: GPR, MPR; BP3: contextual flags (B4)

This is exactly what GTTM (1983) and BP3 did — and that’s why they work.

The Pioneers: From 1957 to 1983

Early Attempts (1960s-70s)

Before GTTM, several researchers attempted to apply Chomsky’s ideas to music. These attempts met with mixed success but paved the way.

Leonard Meyer (American musicologist, 1918-2007), in Emotion and Meaning in Music (1956, even before Chomsky!), had already proposed that our musical perception relies on expectations and their resolution. When a melody creates tension on the dominant (the 5th degree of the scale, for example G in C major), we “expect” a resolution on the tonic (the 1st degree, C). This intuition would be formalized later.

Sundberg and Lindblom (1976) created one of the first formal musical grammars to generate Swedish children’s songs. Their grammar looked like this:

Song → Verse Verse
Verse → Line Line
Line → Motif Motif | Motif Variant
Motif → Note Note Note Note

The result was functional but rigid: the generated melodies were grammatically correct but lacked “life”.

Heinrich Schenker, long before the computer era (1920s-30s), had developed a theory of musical reduction. For Schenker, any tonal piece could be reduced to a fundamental structure (Ursatz, literally “original structure” in German — the underlying harmonic and melodic skeleton). This idea of hierarchical level analysis would directly influence GTTM.

Why These Early Efforts Failed

Purely Chomskyan grammars applied to music had a problem: they did not capture gradation. In linguistics, a sentence is either grammatical or not — “The cat sleeps” is correct, “Cat the sleeps” is not.

In music, things are more nuanced. A melodic sequence can be:

Perfectly idiomatic (very “musical”)
Acceptable but unusual
Technically possible but strange
Impossible to play

This gradation required a new type of rules: preference rules, a major innovation of GTTM.

GTTM: The 1983 Synthesis

Lerdahl and Jackendoff

In 1983, Fred Lerdahl (American composer and theorist, born in 1943) and Ray Jackendoff (American linguist, born in 1945, former student of Chomsky) published A Generative Theory of Tonal Music, abbreviated GTTM. Their goal: to formalize the musical intuition of a competent listener of Western tonal music.

GTTM in one sentence

GTTM is a theory that describes how our brain automatically structures tonal music into groups, metrical levels, and tension/resolution relationships.

The Four Components

GTTM proposes that our musical perception constructs four parallel structures:

Grouping structure: How notes group into motifs, phrases, sections
Metrical structure: The alternation of strong and weak beats
Time-span reduction: Which notes are more important than others
Prolongational reduction: The relationships of tension and relaxation

Preference Rules

GTTM’s major innovation is the concept of preference rules. Unlike Chomsky’s strict rules (a sentence is grammatical or not), GTTM’s rules indicate tendencies:

GPR 2a: A group tends to end when there is a silence
GPR 3a: A group tends to end when there is a large interval

What do these acronyms mean?

GPR = Grouping Preference Rule

MPR = Metrical Preference Rule

TSRPR = Time-Span Reduction Preference Rule

PRPR = Prolongational Reduction Preference Rule

Each rule is numbered: GPR 2a is sub-rule “a” of the 2nd grouping rule.

These rules can conflict. The final interpretation results from their weighting, which explains why different listeners can perceive the same piece differently.

A Concrete Example: “Frère Jacques”

Let’s apply GTTM to a melody everyone knows. Sing it mentally:

Frè - re   Jac - ques, Frè - re   Jac - ques,
C     D   E     C     C     D   E     C

Dor - mez  vous?  Dor - mez  vous?
E     F   G      E     F   G

Grouping structure: Our perception naturally segments into 4 groups of 2 measures. Why?

GPR 2a: implicit silences between phrases (we take a breath after “Jacques”)
GPR 3 (similarity): the repetition of the “Frère Jacques” motif creates a group

Metrical structure: Strong beats on “Frè”, “Jac”, “Dor”, “vous”. The melody coincides with the strong beats (MPR 5: important notes fall on strong beats).

Time-span reduction: The notes C-E-G form the harmonic framework (the C major chord). The other notes (D, F) are ornamental “passages” — if only C-E-G were kept, the melody would remain recognizable.

This formal analysis corresponds to our listener’s intuition — that is exactly GTTM’s objective.

Another example: “Au clair de la lune”

Au   clair  de   la   lu - ne,  mon   a - mi   Pier - rot
C    C      C    D    E    D    C     E   D    D     C

Grouping: two groups (pause after “lune” = GPR 2a)
Reduction: C-E-C = framework (broken C major chord)

BP3: Heir to This Tradition

Bernard Bel developed BP3 (Bol Processor 3) to represent Indian musical structures, particularly tabla compositions. His contribution was to create a formalism that:

Uses generative grammars to describe rhythmic patterns
Manages polymetry (superposition of multiple simultaneous meters, see M5 (upcoming))
Allows contextual rules (similar to Chomsky’s context-sensitive grammars — see L1)

Example of a BP3 grammar for a tabla pattern:

S → Theka Theka Tihai
Theka → dha dhin dhin dha | dha dhin dhin dha dha tin tin ta
Tihai → X X X
X → dha ti dha ge na dha ti dha ge na dha

This grammar captures the recursive structure of the repertoire: a tihai (final cadence) repeats a motif three times, which itself can contain sub-patterns.

Why BP3 Goes Further Than GTTM

Aspect	GTTM	BP3
Direction	Analysis (understanding a piece)	Generation (creating a piece)
Tradition	Western tonal music	Any music (Indian, African…)
Rules	Preference (tendencies)	Production + weighting
Tempo	Single tempo	Native polymetry (see M5 (coming soon))
Output	Structural trees	Playable music (MIDI — Musical Instrument Digital Interface, see M1 — sound)

BP3 does not seek to model human perception (like GTTM), but to provide a flexible composition tool. This difference in objective explains different design choices.

Summary Timeline

To place the ideas in time:

Year	Event	Contribution
1920s-30s	Schenker, Free Composition (published posthumously)	Analysis by hierarchical reduction
1956	Meyer, Emotion and Meaning in Music	Expectation and resolution
1957	Chomsky, Syntactic Structures	Generative grammars
1976	Sundberg & Lindblom	First formal musical grammar
1983	Lerdahl & Jackendoff, GTTM	Preference rules
1990s	Bel, Bol Processor 3	Grammars for generation
2000s	Computational implementations	Automated GTTM

This intellectual lineage shows how an idea born in linguistics was progressively adapted, criticized, and enriched to apply to music.

Key Takeaways

Chomsky (1957) showed that language obeys finite generative rules producing infinite creativity.
What transfers to music: hierarchy (notes → motifs → phrases → sections), recursion, production rules.
What needs adaptation: compositionality is perceptual (tension/resolution) rather than propositional (true/false), functional categories are contextual rather than fixed. These differences are a matter of degree, not nature — and the signifier-signified parallel between language and music is stronger than often believed.
GTTM (1983) adapts the generative approach to music with “preference rules” that model our listener intuitions.
BP3 applies these principles to concrete musical generation, particularly for Indian music.

Glossary

Behaviorism: A school of psychology (early 20th century) that explains learning solely through stimulus-response, without reference to innate mental structures.
BP3 (Bol Processor 3): Musical grammar software developed by Bernard Bel (1990s), designed for Indian and African music.
Chomsky, Noam: American linguist (born in 1928), founder of the theory of generative grammars.
Neighbor tone: An ornamental note adjacent to a main note, which moves away by one step and then returns (e.g., C-D-C).
Cadence: A sequence of chords that punctuates the end of a musical phrase (e.g., V → I = dominant → tonic).
Semantic compositionality: The principle (attributed to Frege) that the meaning of an expression is constructed from the meaning of its parts (“big cat” = meaning(big) + meaning(cat)).
Dominant: The 5th degree of the scale (e.g., G in C major), which creates tension calling for resolution to the tonic.
Frege, Gottlob: German logician (1848-1925), founder of modern logic, to whom the principle of compositionality is attributed.
GPR (Grouping Preference Rule): In GTTM, a preference rule for grouping notes into phrases.
Generative grammar: A set of formal rules capable of producing all valid structures of a language.
Universal grammar: Chomsky’s hypothesis that all humans are born with innate linguistic principles.
GTTM (A Generative Theory of Tonal Music): Lerdahl and Jackendoff’s theory (1983) formalizing the perception of tonal music.
Hierarchy: Organization in nested levels (tree), where each element belongs to a higher-level element.
MPR (Metrical Preference Rule): In GTTM, a preference rule for metrical structure (strong/weak beats).
Passing note: A note that connects two structural notes by conjunct motion (e.g., the D between C and E).
Recursion: The property of a rule that can apply to its own result, allowing for nested structures.
Production rule: A rule of the form $A \to B$ indicating how to rewrite a symbol.
Preference rule: In GTTM, a rule that indicates a tendency rather than a strict obligation.
Saussure, Ferdinand de: Swiss linguist (1857-1913), founder of structural linguistics, known for the concept of the arbitrariness of the sign.
Referential semantics: The ability of language to refer to things in the world (the word “cat” refers to an animal).
Grouping structure: Hierarchical organization of musical events into units (motifs, phrases, sections).
Tihai: In Indian music, a concluding cadence where a motif is repeated three times, leading to the sam (the first beat of the rhythmic cycle, a point of convergence).
Tonic: The resting note or chord of a tonality (e.g., C in C major), the point of resolution for tensions.
Ursatz: Schenker’s term (German, “original structure”) referring to the fundamental harmonic and melodic skeleton of a tonal piece.