M3) Musical Representation Paradigms

Six Levels of Abstraction, from Wave to Meaning

Six ways to think about music digitally — and why they stack in layers, like an OSI model.

Where Does This Article Fit In?

In M1 and M2, we analyzed two musical formats — MIDI and MusicXML — as formal languages located at precise levels of Chomsky’s hierarchy (L1). But why are these two formats so different? Why does converting from one to the other imply a loss?

The answer lies in one word: abstraction. MIDI and MusicXML do not operate at the same level of musical description. And there are many others. This article takes a step back to map six paradigms of musical representation and show that they are organized into a stack of layers of increasing abstraction — from raw signal to musical meaning.

Why Is This Important?

A composer in a DAW (Digital Audio Workstation — music production software like Ableton Live or Logic Pro) uses MIDI without thinking about it. A score engraver uses MusicXML. A live-coder (musician who programs in real-time in front of an audience) uses TidalCycles. Each feels that their tool represents “music.” In reality, each tool operates at a different level of abstraction — and therefore encodes different aspects of the same musical reality.

In 2003, Hugues Vinet (IRCAM) proposed a foundational framework: four Representation Levels (physical, signal, symbolic, knowledge) linked by problems of inter-level conversion. Vinet showed that most research problems in music technology — automatic transcription, synthesis, musical analysis — are problems of moving from one level to another. But his framework was too coarse: the “symbolic” level aggregates very different realities. MIDI (event stream), MusicXML (structured score), BP3 (generative grammar), and GTTM (Generative Theory of Tonal Music — Lerdahl and Jackendoff’s theory that models the perception of musical structures) are all “symbolic” for Vinet, even though they operate at very different levels of abstraction.

Vinet himself acknowledged this in his conclusion: there is a syntactic level gap — musical patterns (motifs, chords, characteristic sequences) do not have a dedicated layer in his model.

This article proposes a finer stack of six levels, which resolves this gap.

The Idea in One Sentence

Musical representation paradigms are not merely equivalent perspectives: they are organized into levels of increasing abstraction, from raw signal to musical meaning — and the most powerful systems are those that span multiple levels.

The Stack of Six Paradigms

graph TD
    subgraph _sg0[""Niveaux d'abstraction""]
        direction TB
        F["6. Functional / Pattern
TidalCycles, Euterpea, SC Patterns
How to compose transformations?"]
        G["5. Generative
BP3, musical grammars
How to produce music?"]
        N["4. Notational
MusicXML, MEI, Lilypond
How to write it?"]
        E["3. Event-based
MIDI, OSC
When to play what?"]
        D["2. Dataflow
Max/MSP, Pure Data
How to route flows?"]
        S["1. Signal
Csound, Faust, SynthDefs
How does it sound?"]
    end
    F --> G
    G --> N
    N --> E
    E --> D
    D --> S

    style F fill:#8b5cf6,color:#fff
    style G fill:#a855f7,color:#fff
    style N fill:#3b82f6,color:#fff
    style E fill:#06b6d4,color:#fff
    style D fill:#14b8a6,color:#fff
    style S fill:#f97316,color:#fff

Each downward arrow represents a concretization: moving from an abstract description to one closer to the physical sound. Each upward arrow is an abstraction: extracting meaning from raw data.

#	Paradigm	Central Question	Examples	Data Type
1	Signal	How does it sound?	Csound, Faust, SC SynthDefs	$f(t) \to \mathbb{R}$
2	Dataflow	How to route flows?	Max/MSP, Pure Data, Reaktor	Connection graph
3	Event-based	When to play what?	MIDI, OSC	Timestamped messages
4	Notational	How to write it?	MusicXML, MEI, Lilypond	Structured document
5	Generative	How to produce it?	BP3, musical grammars	Rewrite rules
6	Functional	How to compose?	TidalCycles, Euterpea	Functions and patterns

The crucial point: moving up a level means gaining in abstraction but losing in detail. A MIDI file (level 3) does not contain timbre information (level 1). A BP3 grammar (level 5) does not contain the exact score (level 4). Each conversion between levels implies either a loss (upwards: summarizing), or an addition of information (downwards: specifying missing details).

The Six Levels in Detail

Level 1: Signal — “How does it sound?”

Music is an audio signal — a function $$f(t)$$ from time to amplitude. This paradigm operates at the lowest level: that of the sound wave itself.

; Csound: sine wave synthesis
instr 1
  aOut  poscil  0.5, 440  ; 440 Hz oscillator
  outs  aOut, aOut
endin

poscil: sine wave oscillator. 0.5, 440: amplitude 0.5, frequency 440 Hz (A4). outs: stereo output.

Characteristics: no concept of “note” — only $\approx$ 44,100 samples per second. Models timbre (the “color” of the sound), not just pitch and rhythm. Allows for the synthesis of sounds impossible on acoustic instruments. Faust formalizes this paradigm: each Faust program is a function $f : \mathbb{R}^n \to \mathbb{R}^m$ — one of the few musical DSLs (Domain-Specific Language — language dedicated to a domain) with a published formal semantics.

Examples: Csound (1986), Faust (2002), SuperCollider SynthDefs, Max/MSP gen~

Level 2: Dataflow — “How to route flows?”

Music is a graph of transformations: boxes (objects) connected by cables (data flows).

[osc~ 440] → [*~ 0.5] → [dac~]

A 440 Hz oscillator, multiplied by 0.5 (50% volume), sent to the audio output (dac~ = digital-to-analog converter).

What distinguishes this paradigm: programming is done via patching (connecting graphical blocks). Visual and modular. Two flows coexist: audio (tilde ~ — processed sample by sample) and control (messages — processed event by event). Dataflow is at the hinge between signal and event: it routes both flows simultaneously. Dataflow analysis is a well-formalized domain in computer science since Kildall’s algorithm (1973).

Examples: Max/MSP (1988), Pure Data (1996), Reaktor, VCV Rack

Level 3: Event-based — “When to play what?”

Music is a sequence of timestamped events: “at time $$t$$ , trigger action $$a$$ .”

t=0ms     Note On  C4 velocity=80
t=500ms   Note Off C4
t=500ms   Note On  D4 velocity=75
t=1000ms  Note Off D4

Characteristics: the main axis is time. Close to what an instrument physically does (press/release). No musical structure (no phrase, motif, harmony). Optimized for real-time performance.

Strengths	Limitations
Real-time transmission	No musical structure
Universal (any synthesizer)	No C $\sharp$ /D $\flat$ distinction
Compact ( $\sim$ bytes/note)	128 velocity levels, 16 channels

Formally, MIDI is a Type 3 (regular) language in Chomsky’s hierarchy (L1) — recognizable by a finite automaton. See M1.

Level 4: Notational — “How to write it?”

Music is a document structured according to notation conventions: staves, measures, clefs, key signatures (sharps or flats at the beginning of the staff indicating the key).

<measure number="1">
  <attributes>
    <key><fifths>0</fifths></key>
    <time><beats>4</beats><beat-type>4</beat-type></time>
  </attributes>
  <note>
    <pitch><step>C</step><octave>4</octave></pitch>
    <type>quarter</type>
  </note>
</measure>

Characteristics: the main axis is the page. Encodes the composer’s intention. Includes expression markings (dynamics, phrasing). Preserves the tradition of the musical score.

Strengths	Limitations
Notational completeness	$\sim$ 10x larger than MIDI
Interoperability (Finale, MuseScore)	Western bias (standard notation)
Human-readable	Static (1 file = 1 fixed score)

Formally, MusicXML is a Type 2 (context-free) language in Chomsky’s hierarchy (L1). See M2.

Vinet’s observation: notation formats actually overlap two levels. Tempo (quarter note = 120) is event-symbolic information, but the indication “con fuoco” is high-level semantic information. This overlap reveals that notation is a composite format, not a “pure” layer.

Level 5: Generative — “How to produce it?”

Music is a process defined by rules: grammars, probabilities, algorithms. The file does not contain “the music” but “how to produce music.”

// BP3: generative grammar
gram#1[1] S → _vel(80) C4 _vel(60) {D4,E4} F4 G4
gram#1[2] S → A B
gram#1[3] A → C4 C4 | D4 D4
gram#1[4] B → E4 F4 G4

S → ...: $$S$$ rewrites to the sequence. _vel(80): velocity 80. {D4,E4}: chord. |: random choice.

Each execution can produce a different sequence depending on stochastic (weighted random) choices.

Characteristics: the main axis is transformation. Describes families of pieces, not a unique piece. Non-deterministic. Explicitly encodes hierarchy (themes, motifs, phrases).

Strengths	Limitations
One grammar = infinite realizations	Requires programming
Explicit structure (hierarchies)	No standard exchange
Semantic compactness	Impossible to archive “the” version

Formally, BP3 (I2) is a PCFG (Probabilistic Context-Free Grammar — probabilistic context-free grammar, B1) with flags (conditional variables, B4) that push it towards mildly context-sensitive (L9).

This is exactly the gap identified by Vinet: musical patterns and structures occupy a “missing position” between the symbolic and knowledge, which his 4-level model did not capture.

Level 6: Functional / Pattern — “How to compose transformations?”

Music is a composition of functions: patterns (repetitive motifs) transformed by algebraic operations (transposition, inversion, superposition).

-- TidalCycles: polyrhythm in one line
d1 $ stack [s "bd sd" # speed 1.2, s "hh*4" # gain 0.8]

stack: superimpose two layers (parallel composition). s "bd sd": bass drum + snare drum pattern. s "hh*4": four hi-hats per cycle.

What distinguishes this paradigm: patterns are first-class values — they can be combined, transformed, and passed as arguments. Inheritance from functional programming (Haskell): composition, lazy evaluation (values are only computed when needed), immutability. Proximity to musical algebra: transposition = translation, inversion = symmetry. Ideal for live coding (performative programming in real-time in front of an audience).

Examples: TidalCycles (McLean, 2014), Euterpea (Hudak, 2014), SuperCollider Patterns, Extempore

Conversion Between Levels: The Central Mechanism

Each transition from one level to another is a conversion problem with formalizable properties:

graph LR
    S["Signal"] -->|"transcription
(loss)"| E["Event"]
    E -->|"analysis
(structuring)"| N["Notation"]
    N -->|"induction
(little explored)"| G["Generative"]

    G -->|"derivation
(choices)"| N2["Notation"]
    N2 -->|"playback
(quantization)"| E2["Event"]
    E2 -->|"synthesis
(add timbre)"| S2["Signal"]

    style S fill:#f97316,color:#fff
    style E fill:#06b6d4,color:#fff
    style N fill:#3b82f6,color:#fff
    style G fill:#a855f7,color:#fff
    style S2 fill:#f97316,color:#fff
    style E2 fill:#06b6d4,color:#fff
    style N2 fill:#3b82f6,color:#fff

Interface	Down $\downarrow$	Up $\uparrow$	Loss?
Signal $\leftrightarrow$ Event	Synthesis (add timbre)	Automatic transcription	Yes ( $\uparrow$ ): timbre lost
Event $\leftrightarrow$ Notation	Playback (quantization)	Score following	Partial: expression lost
Notation $\leftrightarrow$ Generative	Derivation (random choices)	Grammar induction	Yes ( $\uparrow$ ): rules lost
Generative $\leftrightarrow$ Functional	Pattern instantiation	Abstraction	Yes ( $\uparrow$ ): transformations lost

Vinet’s insight: most research problems in music technology are problems of inter-level conversion. Automatic transcription? Signal → Event. Synthesis? Event → Signal. Algorithmic composition? Generative → Notation → Event → Signal. And each conversion has a cost.

Multi-Level Tools

The power of a musical tool is often measured by the number of levels it spans. Here is a map of the main tools positioned on our stack:

Tool	Signal	Dataflow	Event-based	Notational
Csound	$\bullet$	$\bullet$
Faust	$\bullet$
Max/MSP	$\bullet$	$\bullet$	$\bullet$
Pure Data	$\bullet$	$\bullet$	$\bullet$
SuperCollider	$\bullet$	$\bullet$	$\bullet$	$\bullet$
Sonic Pi	$\bullet$	$\bullet$	$\bullet$
Extempore	$\bullet$	$\bullet$	$\bullet$
MIDI (format)	$\bullet$
MusicXML	$\bullet$
Lilypond	$\bullet$
music21	$\bullet$	$\bullet$
OpenMusic	$\bullet$	$\bullet$	$\bullet$
BP3	$\bullet$	$\bullet$
TidalCycles	(via SC)	$\bullet$	$\bullet$
Euterpea	$\bullet$	$\bullet$
GTTM (theory)	$\bullet$

Observations:

Tools that touch only one level (MIDI, MusicXML, Faust) are specialized and very good in their domain, but limited.
Tools that span 3-4 levels (SuperCollider, Sonic Pi, Extempore, Max/MSP, OpenMusic) are the most versatile. They pay for this versatility with a higher learning curve.
No single tool covers all six levels. There is no universal system for musical representation.
SuperCollider and OpenMusic stand out for their coverage of higher levels (generative/functional) in addition to the signal — but they are not the only ones.

Direct Comparison: The Same Motif Across Four Levels

Let’s take a simple motif: C-E-G (arpeggio — notes of a chord played successively — of C major).

Level 3 — MIDI (Event-based)

Delta=0    Note On  60 (C4)  vel=80
Delta=480  Note Off 60
Delta=0    Note On  64 (E4)  vel=80
Delta=480  Note Off 64
Delta=0    Note On  67 (G4)  vel=80
Delta=480  Note Off 67

Delta = time elapsed since the previous event, in ticks (typically 480 ticks per quarter note).

Captures: exact pitches, timings, velocities. Ignores: it’s an arpeggio, it’s a C major chord, it’s a tonic degree.

Level 4 — MusicXML (Notational)

<measure number="1">
  <note>
    <pitch><step>C</step><octave>4</octave></pitch>
    <duration>1</duration><type>quarter</type>
  </note>
  <note>
    <pitch><step>E</step><octave>4</octave></pitch>
    <duration>1</duration><type>quarter</type>
  </note>
  <note>
    <pitch><step>G</step><octave>4</octave></pitch>
    <duration>1</duration><type>quarter</type>
  </note>
</measure>

Captures: the key (C major), the time signature, the note names. Ignores: it’s an arpeggio (typical figure), it’s a tonic function.

Level 5 — BP3 (Generative)

gram#1[1] ARPEGGIO → _vel(80) C4 E4 G4
gram#1[2] ARPEGGIO → _vel(80) C4 G4 E4      // inversion
gram#1[3] ARPEGGIO → _vel(80) G4 E4 C4      // descending
gram#1[4] S → ARPEGGIO CADENCE

Captures: the notion of arpeggio as a structural unit, possible variations. Ignores: the exact realization chosen at each execution.

Multi-level — SuperCollider (3 $\to$ 5)

Pbind(
  \instrument, \default,
  \note, Pseq([0, 4, 7]),    // C-E-G in chromatic degrees
  \dur, 0.5,
  \amp, 0.8
).play;

Pbind: binding of musical parameters (functional paradigm). Pseq([0, 4, 7]): ordered sequence ( $0 = \text{C4}$ , $4 = \text{E4}$ , $7 = \text{G4}$ ). \instrument, \default: default SynthDef (signal paradigm).

Captures: the structure (Pbind), the timbre (SynthDef), the timing (dur). Ignores: traditional notation.

The Fundamental Compromise

No format can be optimal on all axes. It’s an irreducible compromise:

graph TD
    EX["Exhaustiveness
(encode everything)"]
    CO["Compactness
(little data)"]
    GE["Generativity
(produce variations)"]
    EX --- CO
    CO --- GE
    GE --- EX
    MIDI((MIDI)) -.-> CO
    XML((MusicXML)) -.-> EX
    BP3((BP3)) -.-> GE
    style MIDI fill:#06b6d4,color:#fff
    style XML fill:#3b82f6,color:#fff
    style BP3 fill:#a855f7,color:#fff

Imagine we want to describe a forest:

Exhaustiveness: describe every tree, every leaf $\to$ enormous amount of data

Compactness: say “an oak forest” $\to$ very short, but details are lost

Generativity: give growth rules $\to$ can produce infinitely many forests, but none is “the” original forest

Format	Exhaustiveness	Compactness	Generativity
MIDI	Medium	High	None
MusicXML	High	Low	None
BP3	Low	Very High	High
SuperCollider	Variable	Medium	High
TidalCycles	Low	High	High
Csound	High (signal)	Low	Low

Why BP2SC Crosses Layers

The BP2SC project transpiles (converts source code from one language to another) BP3 grammars into SuperCollider patterns. This is not a horizontal conversion between formats of the same level — it’s a vertical traversal:

graph TD
    L5["Level 5: Generative
BP3 grammar"] -->|"derivation"| L4["Level 4–5: Structure
derivation tree"]
    L4 -->|"linearization"| L3["Level 3: Event-based
note sequence (Pbind)"]
    L3 -->|"SC execution"| L1["Level 1: Signal
audio (SynthDefs)"]

    style L5 fill:#a855f7,color:#fff
    style L4 fill:#7c3aed,color:#fff
    style L3 fill:#06b6d4,color:#fff
    style L1 fill:#f97316,color:#fff

BP2SC skips the notational level (no intermediate score) — which is only possible because SuperCollider operates at the generative (Patterns), event-based (OSC), and signal (SynthDefs) levels simultaneously.

Why not simply export to MIDI?

MIDI is deterministic: a MIDI file = a fixed sequence. BP3 describes families of sequences.
MIDI has no rules: you cannot encode “choose between A and B with 70/30 probability” in MIDI.
SC preserves generativity: SC patterns — Pbind, Pwrand (weighted choice), Pseq (sequence) — express the same concepts as BP3.

// SuperCollider equivalent of a weighted BP3 choice
Pwrand(
  [Pseq([60, 64, 67]), Pseq([60, 67, 64])],
  [0.7, 0.3],  // 70% ascending, 30% inversion
  inf
)

This conversion preserves the generative intention — something a MIDI export could never do.

Towards an “OSI Model” for Music?

The analogy with the OSI model (Open Systems Interconnection — the 7-layer model that structures network protocols) is not trivial. The OSI model succeeds because it formalizes:

Layers with clearly separated responsibilities
Interfaces: each layer communicates only with its neighbors
Encapsulation: data from an upper layer is “wrapped” by the lower layer

For music, such a model would allow us to:

Predict which conversions are easy and which are lossy
Characterize tools by the layers they span
Identify gaps: which standards are missing at which levels? (Vinet already identified a “missing control format” between signal and event in 2003)

This formalization effort goes beyond the scope of this article — but it underpins the entire M series. The map we have drawn here is a first step towards this model.

Key Takeaways

Six paradigms, six levels of abstraction: Signal (1) $\to$ Dataflow (2) $\to$ Event-based (3) $\to$ Notational (4) $\to$ Generative (5) $\to$ Functional (6).
Moving up = abstracting, moving down = concretizing. Each conversion between levels has a cost (loss or addition of information).
Multi-level tools (SuperCollider, Sonic Pi, Extempore, Max/MSP, OpenMusic) are the most powerful — but none covers all six levels.
The impossibility triangle: exhaustiveness, compactness, and generativity cannot be maximized simultaneously.
BP2SC is a vertical traversal: from generative (BP3) to signal (SC), skipping notation.
Vinet’s framework (2003) provides the foundations; the gap he identifies at the syntactic/patterns level is exactly the space where BP3 and TidalCycles operate.

To Go Further

Vinet (2003): “The Representation Levels of Music Information” — the foundational 4-level framework, CMMR 2003, LNCS 2771. DOI:10.1007/978-3-540-39900-1_17
IEEE 1599: Baratè, Haus & Ludovico (2019) — 6-layer multilayer standard. DOI:10.1109/MMRP.2019.8665381
Symbolic Formats: Selfridge-Field (1997) Beyond MIDI: The Handbook of Musical Codes
Musical DSLs: McLean & Dean (2018) The Oxford Handbook of Algorithmic Music
Faust: Orlarey, Fober & Letz (2004) “Syntactical and Semantical Aspects of Faust”
TidalCycles: McLean (2014) Making Programming Languages to Dance to
MIDI: midi.org | MusicXML: w3.org/2021/06/musicxml40
BP3: bolprocessor.org | SuperCollider: supercollider.github.io

Glossary

Arpeggio: notes of a chord played successively rather than simultaneously
BP3: Bol Processor 3 — musical grammar software developed by Bernard Bel (see I2)
DAW: Digital Audio Workstation — music production software (Ableton Live, Logic Pro, FL Studio)
Dataflow: paradigm where computation is modeled as a graph of data flows between processing nodes
Deterministic: a system that always produces the same result with the same inputs
DSL: Domain-Specific Language — a programming language dedicated to a particular domain
GTTM: Generative Theory of Tonal Music — Lerdahl and Jackendoff’s theory modeling musical perception
IEEE 1599: multilayer standard for musical representation with 6 synchronized XML layers
Live coding: a performative programming practice where code is written and modified in real-time in front of an audience
OSI Model: Open Systems Interconnection — a 7-layer model structuring network protocols
Pattern: in SuperCollider or TidalCycles, an object that generates sequences of values according to rules
PCFG: Probabilistic Context-Free Grammar — context-free grammar enriched with probabilities (see B1)
Stochastic: weighted random — each choice has an associated probability
SynthDef: in SuperCollider, a synthesizer definition as a graph of audio generators
Transpile: to convert code from one language to another at the same level of abstraction
Velocity: in MIDI, the intensity of a note (0-127), often interpreted as volume

Links in the Series

M1 — MIDI under the Formal Microscope — the Event-based Paradigm (Level 3)
M2 — MusicXML under the Formal Microscope — the Notational Paradigm (Level 4)
M4 — Grammars and Music — Deep Dive into the Generative Paradigm (Level 5)
I2 — Bol Processor — BP3, Central Generative System
I3 — SuperCollider — Multi-Level Tool (1-3-5-6)
I4 — Introduction to MIDI — the Event-based Protocol
I5 — Introduction to MusicXML — the Notational Format
L1 — Chomsky Hierarchy — Formal Framework for Grammars
L5 — The Three Semantics — Complementary Perspectives on Programs
L11 — Beyond the Three Semantics — The Network of Semantics
B1 — Probabilistic Grammars — Formal Foundations of BP3
B7 — The BP2SC Transpiler — Vertical Traversal of Levels

Prerequisites: M1, M2, L5
Reading time: 18 min
Tags: #paradigms #musical-representation #abstraction-levels #midi #musicxml #generative #supercollider #signal #dataflow #functional #vinet

Next article: M4 — Grammars and Music