M6) Hierarchical Structure in Music

GTTM Demystified

When you listen to a symphony, how do you know that a passage is a “conclusion”? How do you perceive that a theme “returns”? GTTM attempts to formalize these intuitions we all have.

Where does this article fit in?

This article expands on the ideas from M4 by detailing GTTM — the most advanced theory for formalizing musical perception. It is also a theoretical foundation for bp2sc, the transpiler (source-to-source code translator) connecting I2 (Bol Processor, version 3) to I3 (see B7).

Callout: What is GTTM?

GTTM stands for Generative Theory of Tonal Music. It is a theory developed by Fred Lerdahl and Ray Jackendoff in 1983 that attempts to formally describe how a listener perceives and mentally organizes Western tonal music. The term “generative” refers to Chomsky’s generative linguistics: just as a grammar can “generate” all possible sentences of a language, GTTM proposes a set of rules that can “generate” all valid perceptual structures of a musical piece.

Why is it important?

Imagine you had to program software capable of automatically analyzing a musical piece: identifying phrases, themes, moments of tension and resolution. Where would you start?

In 1983, Fred Lerdahl (composer) and Ray Jackendoff (linguist) published A Generative Theory of Tonal Music (GTTM), an ambitious attempt to formalize how we perceive musical structure. Their theory profoundly influenced computational musicology, automatic musical analysis, and even computer-assisted composition.

Understanding GTTM means understanding why music seems to have “meaning” to us, even without words.

The Idea in One Sentence

GTTM models our perception of tonal music through four parallel hierarchical structures: grouping, meter, time-span reduction, and prolongational reduction.

Let’s Explain Step by Step

Example 1: Reading a Sentence

Before addressing music, let’s consider reading. When you read:

“The small grey cat sleeps on the red sofa.”

Your brain doesn’t process this sentence word by word. It automatically builds a structure:

                    Phrase
                      |
        ┌─────────────┼─────────────┐
     Sujet          Verbe       Complément
        |             |             |
  "Le petit        "dort"     "sur le canapé
   chat gris"                     rouge"

You know that “small” modifies “cat,” not “sleeps.” You know that “red” describes the “sofa,” not the “cat.” This hierarchical structure is implicit — you don’t consciously calculate it.

Example 2: Hearing a Melody

The same thing happens in music. Take “Au clair de la lune”:

Au clair de la lu-ne, mon a-mi Pier-rot
|__________________|  |_______________|
    Phrase A              Phrase B

Prê-te-moi ta plu-me pour é-crire un mot
|__________________|  |_______________|
    Phrase C              Phrase D

You naturally perceive these four phrases, even if no one told you. And you perceive that A+B form a larger unit (the “question”) which opposes C+D (the “answer”).

It is this hierarchical structure that GTTM attempts to formalize.

The Four Components of GTTM

1. Grouping Structure

Fundamental Question: How do notes group together into motives, phrases, sections?

The grouping structure is the most intuitive component of GTTM. It answers a simple question: when you listen to music, how do you know where a “phrase” begins and where it ends?

Think of punctuation in a text. Without commas or periods, a long string of words would be difficult to understand. In music, there is no visible punctuation, but our brain naturally “hears” separations. The grouping structure models these perceived boundaries.

It organizes music into nested units, like Russian dolls:

Pièce complète
├── Section A
│   ├── Phrase 1
│   │   ├── Motif a
│   │   └── Motif b
│   └── Phrase 2
│       ├── Motif a'
│       └── Motif c
└── Section B
    └── ...

Grouping Preference Rules (GPR):

GTTM proposes rules that describe our perceptual tendencies. These rules are not absolute laws but preferences: they indicate what we tend to perceive, not what we are obliged to perceive.

Callout: Why “Preference Rules”?

Unlike strict grammar rules (“a sentence must have a verb”), preference rules are statistical tendencies. They can contradict each other! For example, GPR 2a might suggest a boundary in one place, while GPR 3a suggests another. In this case, GTTM proposes that the rules “combine” and that the perceived boundary is the one with the most converging cues. This is exactly like in visual perception: multiple cues can add up or contradict each other.

The main GPRs:

GPR 2a (Temporal Proximity): A silence or lengthening between two notes suggests a group boundary. Example: in “Frère Jacques” (Brother John), the pause after “dor-mez-vous” (are you sleeping) creates a clear separation.

GPR 2b (Change of Attack): An abrupt change in the mode of attack (from legato to detached) suggests a boundary.

GPR 3a (Register): A large melodic interval (typically more than 7 semitones, i.e., a fifth, the interval between C and G) suggests a boundary. Example: if a melody ascends C-D-E then jumps to high C, this jump creates a perceptual break.

GPR 3c (Dynamics): A sudden change in volume (piano, soft, to forte, loud, or vice-versa) suggests a boundary. Example: the entrance of the tutti (full orchestra) after a solo passage.

GPR 3d (Articulation): A change in articulation (legato, meaning connected playing, to staccato, meaning detached notes) suggests a boundary. Example: a sung phrase followed by staccato notes.

Example with “Au clair de la lune”:

Au clair de la lu- | ne      (pause = GPR 2a)
Mi mi mi ré mi    | do      (note longue + changement de direction)

The silence after “lu-” and the long note on “ne” create a group boundary.

2. Metrical Structure

Fundamental Question: Which beats are “strong” and which are “weak”?

The metrical structure models our perception of the “beat” of music. Note: this is not the time signature written on the score (4/4, 3/4…), but the hierarchy of accents that we mentally perceive.

Imagine tapping your foot while listening to music. You don’t tap on every note, but on certain regular points. And among these points, some seem more “important” than others (the “one” of each measure, for example). It is this hierarchy that the metrical structure captures.

Niveau 1 (mesure)    : .           .           .           .
Niveau 2 (demi)      : .     .     .     .     .     .     .
Niveau 3 (temps)     : .  .  .  .  .  .  .  .  .  .  .  .  .
Niveau 4 (croches)   : ................

Notes               : Au clair de la lu- ne, mon a- mi Pier- rot

A “strong” beat is a beat present at multiple levels of the hierarchy. The first beat of each measure is the strongest because it appears at all levels.

Metrical Preference Rules (MPR):

These rules describe how we infer the metrical structure from musical events:

MPR 1 (Coincidence): Musical events (note attacks) must coincide with beats of the metrical grid. If you hear a note, your brain assumes it falls on a beat.

MPR 5 (Length): Long notes tend to fall on strong beats. Example: in “Happy Birthday,” the “birth-” of “birthday” is long AND on a strong beat.

MPR 6 (Harmony): Important chord changes (especially cadences, harmonic formulas that conclude a musical phrase) prefer strong beats. Example: the final chord of a perfect cadence (dominant-tonic progression, V-I) almost always falls on a strong beat.

3. Time-Span Reduction

Fundamental Question: Among the notes of a group, which one is the most “important”?

Time-span reduction answers a classic musicological question: if you had to “summarize” a melody, which notes would you keep?

This component builds a reductional tree: each group has a “head” (structurally important note), and the other notes are elaborations of this head. An elaboration is a note that “decorates” or “ornaments” the head without changing the structural meaning.

Callout: What is a Reductional Tree?

A reductional tree is a hierarchical structure where:

At the lowest level, you have all the notes of the piece

At each higher level, the least important notes are “eliminated”

At the top, only the most structural note(s) remain (often the tonic or dominant)

It’s the musical equivalent of a text summary: the essential is kept, details are eliminated.

Simplified example:

Notes :     do   ré   mi   ré   do
             \   /    |    \   /
              do     mi      do
                \     |     /
                   do (finale)

In this example:

The two Ds are passing notes (neighboring tones) — they are eliminated. C is the head of each extreme group.
E is the melodic peak, but harmonically less stable than C.
The final C prevails: it is the tonic (resting note), in a cadential position (end of phrase). It dominates the whole — E is an elaboration (upper neighbor) of C.

Time-Span Reduction Preference Rules (TSRPR):

TSRPR 1 (Metrical Position): The head of a group must be on a metrically strong beat. Example: between an eighth note on the beat and an eighth note off the beat, the one on the beat will be preferred as the head.

TSRPR 2 (Harmonic Stability): The head of a group must be harmonically stable (consonant, i.e., perceived as stable and “in agreement” with the harmony). Example: if a C major arpeggio contains C-E-G, the C (root) will be preferred as the head.

TSRPR 3 (Melodic Connection): Melodically close notes (small intervals, conjunct motion, i.e., by successive whole or half steps) tend to be grouped, and the “framing” note (beginning or end of the conjunct motion) is the head. Example: in C-D-E-D-C, the E (climax) can be the head, or the C’s (framing notes) depending on the context.

4. Prolongational Reduction

Fundamental Question: What are the relationships of tension and release between events?

Prolongational reduction is the most abstract, yet also the most musically significant, component of GTTM. It captures our feeling that music “goes somewhere” and then “arrives” — what musicians call tension and resolution.

Imagine a story with a beginning, a development that builds suspense, and a final resolution. Music works similarly: some passages create anticipation, others resolve it. Prolongational reduction models these relationships.

Callout: Prolongation vs. Progression

The key distinction is between prolonging (staying in the same harmonic state) and progressing (changing harmonic state):

Prolongation: C major → a few passing notes → C major. We stay “in” C from beginning to end — the harmony doesn’t really move.

Progression: G7 → C major. We change harmony: G7 (dominant, unstable) creates tension that resolves upon arriving at C major (tonic, stable). It’s a movement, not a sustain.

Three types of connections:

Strong prolongation: One event directly prolongs another (same harmony). Example: C major — a few melody notes — C major. The second C major is a prolongation of the first.

Weak prolongation: One event is an “embellishment” or “neighbor” of another. Example: C major — D minor — C major. The D minor is an embellishment that decorates the C without truly leaving it.

Progression: One event creates tension towards another. Example: G7 to C major. The G7 is not a prolongation of C; it progresses towards it, creating harmonic motion.

Exemple : Cadence parfaite V → I

     I (stable)
    / \
   V   I
   |   |
(tension) → (résolution)

The V (dominant) chord creates tension that resolves to I (tonic). This relationship is represented in the prolongational tree.

Why Trees?

The Power of Tree Representation

Callout: What is a Tree in Computer Science?

A tree (in computer science and mathematics) is a data structure that represents a hierarchy. Visually, it’s like an inverted family tree:

The root is at the top (the common ancestor)

The nodes are the intermediate elements

The leaves are the final elements, with no descendants

Each node (except the root) has exactly one parent

Each node can have zero, one, or more children

In a GTTM musical tree, the root represents the entire piece, the leaves are the individual notes, and the intermediate nodes are the groups, phrases, and sections.

A tree naturally captures:

Hierarchy: A parent node dominates its children
Inclusion: Children are “contained” within the parent
Relationships: One can trace back from any node to the root

For music, this allows us to answer questions like:

“Which phrase does this note belong to?” → Ascend the grouping tree
“Is this passage stable or tense?” → Consult the prolongational tree
“What is the structural note of this section?” → Find the head in the reduction

Comparison with a Flat List

Without a tree structure, a musical piece would just be a sequence of notes:

Liste : do, ré, mi, fa, sol, la, si, do

With a tree:

                  Phrase
                 /      \
           Montée      Descente
          /  |  \      /  |  \
        do  ré  mi    fa sol la  si  do

The tree captures the fact that C-D-E forms a unit, that this unit opposes the descent, etc.

Analysis vs. Generation

GTTM for Analysis

GTTM was designed for analysis — taking a piece and inferring its structure. The preference rules guide this analysis:

Entrée : partition de "Au clair de la lune"
Processus : appliquer les règles GPR, MPR, TSRPR, PRPR
Sortie : quatre arbres représentant la structure perçue

(Note: PRPR = Prolongational Reduction Preference Rules, the preference rules for prolongational reduction.)

GTTM for Generation

Can the process be reversed? Start from an abstract structure and generate a piece?

It’s more difficult, because preference rules are descriptive (they describe what is perceived) and not prescriptive (they don’t tell you what to compose).

However, several researchers have adapted GTTM for generation:

Hamanaka et al. created a generative system based on GTTM
Lerdahl himself proposed extensions in Tonal Pitch Space (2001) that are better suited for generation

GTTM and BP3: Two Opposing Directions

GTTM and I2 share the principle of hierarchical structure in music, but their theoretical origins and directions are independent:

GTTM comes from cognitive linguistics (Jackendoff) applied to musical perception
BP3 comes from formal language theory (Chomsky, Panini) applied to musical generation

They are complementary, not derived from each other:

Aspect	GTTM	BP3
Direction	↑ Ascending (analysis)	↓ Descending (generation)
Theoretical Lineage	Cognitive linguistics (Jackendoff)	Formal languages (Chomsky, Panini)
Input	Musical surface (notes)	Grammar (production rules)
Output	Structural trees	Musical sequences
Rules	Preference (perceptual tendencies)	Production (deterministic or weighted)
Application	Western tonal music	Any music (Indian, Western…)

In terms of abstraction levels: GTTM ascends from events to structure, BP3 descends from grammar to events. A system combining both would achieve a complete cycle: analyze a piece (GTTM ↑), extract a structure from it, then generate variations (BP3 ↓).

Limitations of GTTM

Focused on Western Tonal Music

GTTM was developed for tonal music (Bach, Mozart, Beethoven…). Its rules do not directly apply to:

Atonal music (music without a tonal center, like Schoenberg, Webern)
Non-Western music (Indian ragas, Indonesian gamelan)
Electronic music (no discrete “notes”)

Formalization Remains Incomplete

Preference rules are often formulated qualitatively (“a large interval suggests a boundary”). But how many semitones make a “large” interval? GTTM doesn’t always specify this.

A Single Idealized Listener

GTTM models the perception of an “idealized competent listener.” But different listeners can perceive the same piece differently. This variability is not well captured.

Key Takeaways

GTTM proposes four parallel structures to model our musical perception:

1. Grouping (segmentation into units)
2. Meter (hierarchy of strong/weak beats)
3. Time-span reduction (important notes vs. ornaments)
4. Prolongational reduction (tension and release)

Preference rules describe our perceptual tendencies (silences = boundaries, long notes = strong beats…).
Tree representation naturally captures musical hierarchy and relationships.
GTTM is designed for analysis, but its principles can be adapted for generation.
Limitations: focused on Western tonal music, sometimes vague formalization, a single type of listener.

Glossary

Tree (computer science): A hierarchical data structure with a root, intermediate nodes, and leaves. Each node (except the root) has a unique parent.
Reductional tree: A hierarchical structure where each level simplifies the lower level by retaining only the structurally important elements.
Neighboring tone (embellishment): An ornamental note that leaves a structural note by conjunct motion and returns to it. Example: C-D-C.
Cadence: A harmonic formula that concludes a musical phrase. The perfect cadence (V-I) is the most conclusive.
Consonance/Dissonance: Consonance is the quality of a stable and “pleasant” sound (octave, fifth, major third). Dissonance is unstable and calls for resolution.
Dominant (V): The fifth degree of the scale, a chord of tension that calls for the tonic.
Elaboration: A note or passage that “decorates” a structural note without changing its meaning.
GPR (Grouping Preference Rules): Preference rules for grouping. They describe how we perceive boundaries between musical groups.
GTTM (Generative Theory of Tonal Music): Generative Theory of Tonal Music, developed by Lerdahl and Jackendoff (1983).
Meter: Hierarchical organization of strong and weak beats, distinct from rhythm (note durations).
Conjunct motion: Melodic movement by successive whole or half steps (C-D-E), as opposed to disjunct motion (leaps, like C-G).
MPR (Metrical Preference Rules): Preference rules for metrical structure. They describe how we infer the beat grid.
PRPR (Prolongational Reduction Preference Rules): Preference rules for prolongational reduction. They describe how we perceive tension/resolution relationships.
Progression (harmonic): Movement from one chord to another that creates tension and a sense of direction.
Prolongation: A relationship where one musical event extends or elaborates another without creating a new harmonic direction.
Preference rule: A rule that indicates a perceptual tendency, not an obligation. It can conflict with other rules.
Head: The structurally most important note of a group. The other notes in the group are elaborations of the head.
Tonic (I): The first degree of the scale, a point of rest and harmonic stability.
TSRPR (Time-Span Reduction Preference Rules): Rules for identifying group heads in time-span reduction.

Prerequisite: M4
Reading time: 11 min
Tags: #gttm #musical-analysis #hierarchy #lerdahl #jackendoff #cognition

Next article: B1 — PCFG: when grammars roll the dice