M3) Musical Representation Paradigms

Six Levels of Abstraction, from Wave to Meaning

Six ways to think about music digitally — and why they stack in layers, like an OSI model.

Where Does This Article Fit In?

In M1 and M2, we analyzed two musical formats — MIDI and MusicXML — as formal languages located at precise levels of Chomsky’s hierarchy (L1). But why are these two formats so different? Why does converting from one to the other imply a loss?

The answer lies in one word: abstraction. MIDI and MusicXML do not operate at the same level of musical description. And there are many others. This article takes a step back to map six paradigms of musical representation and show that they are organized into a stack of layers of increasing abstraction — from raw signal to musical meaning.


Why Is This Important?

A composer in a DAW (Digital Audio Workstation — music production software like Ableton Live or Logic Pro) uses MIDI without thinking about it. A score engraver uses MusicXML. A live-coder (musician who programs in real-time in front of an audience) uses TidalCycles. Each feels that their tool represents “music.” In reality, each tool operates at a different level of abstraction — and therefore encodes different aspects of the same musical reality.

In 2003, Hugues Vinet (IRCAM) proposed a foundational framework: four Representation Levels (physical, signal, symbolic, knowledge) linked by problems of inter-level conversion. Vinet showed that most research problems in music technology — automatic transcription, synthesis, musical analysis — are problems of moving from one level to another. But his framework was too coarse: the “symbolic” level aggregates very different realities. MIDI (event stream), MusicXML (structured score), BP3 (generative grammar), and GTTM (Generative Theory of Tonal Music — Lerdahl and Jackendoff’s theory that models the perception of musical structures) are all “symbolic” for Vinet, even though they operate at very different levels of abstraction.

Vinet himself acknowledged this in his conclusion: there is a syntactic level gap — musical patterns (motifs, chords, characteristic sequences) do not have a dedicated layer in his model.

This article proposes a finer stack of six levels, which resolves this gap.

The Idea in One Sentence

Musical representation paradigms are not merely equivalent perspectives: they are organized into levels of increasing abstraction, from raw signal to musical meaning — and the most powerful systems are those that span multiple levels.


The Stack of Six Paradigms

 

graph TD
    subgraph _sg0[""Niveaux d'abstraction""]
        direction TB
        F["6. Functional / Pattern
TidalCycles, Euterpea, SC Patterns
How to compose transformations?"] G["5. Generative
BP3, musical grammars
How to produce music?"] N["4. Notational
MusicXML, MEI, Lilypond
How to write it?"] E["3. Event-based
MIDI, OSC
When to play what?"] D["2. Dataflow
Max/MSP, Pure Data
How to route flows?"] S["1. Signal
Csound, Faust, SynthDefs
How does it sound?"] end F --> G G --> N N --> E E --> D D --> S style F fill:#8b5cf6,color:#fff style G fill:#a855f7,color:#fff style N fill:#3b82f6,color:#fff style E fill:#06b6d4,color:#fff style D fill:#14b8a6,color:#fff style S fill:#f97316,color:#fff

 

Each downward arrow represents a concretization: moving from an abstract description to one closer to the physical sound. Each upward arrow is an abstraction: extracting meaning from raw data.

# Paradigm Central Question Examples Data Type
1 Signal How does it sound? Csound, Faust, SC SynthDefs $f(t) \to \mathbb{R}$
2 Dataflow How to route flows? Max/MSP, Pure Data, Reaktor Connection graph
3 Event-based When to play what? MIDI, OSC Timestamped messages
4 Notational How to write it? MusicXML, MEI, Lilypond Structured document
5 Generative How to produce it? BP3, musical grammars Rewrite rules
6 Functional How to compose? TidalCycles, Euterpea Functions and patterns

The crucial point: moving up a level means gaining in abstraction but losing in detail. A MIDI file (level 3) does not contain timbre information (level 1). A BP3 grammar (level 5) does not contain the exact score (level 4). Each conversion between levels implies either a loss (upwards: summarizing), or an addition of information (downwards: specifying missing details).


The Six Levels in Detail

Level 1: Signal — “How does it sound?”

Music is an audio signal — a function $f(t)$ from time to amplitude. This paradigm operates at the lowest level: that of the sound wave itself.

 

; Csound: sine wave synthesis
instr 1
  aOut  poscil  0.5, 440  ; 440 Hz oscillator
  outs  aOut, aOut
endin

 

poscil: sine wave oscillator. 0.5, 440: amplitude 0.5, frequency 440 Hz (A4). outs: stereo output.

Characteristics: no concept of “note” — only $\approx$ 44,100 samples per second. Models timbre (the “color” of the sound), not just pitch and rhythm. Allows for the synthesis of sounds impossible on acoustic instruments. Faust formalizes this paradigm: each Faust program is a function $f : \mathbb{R}^n \to \mathbb{R}^m$ — one of the few musical DSLs (Domain-Specific Language — language dedicated to a domain) with a published formal semantics.

Examples: Csound (1986), Faust (2002), SuperCollider SynthDefs, Max/MSP gen~

Level 2: Dataflow — “How to route flows?”

Music is a graph of transformations: boxes (objects) connected by cables (data flows).

 

[osc~ 440] → [*~ 0.5] → [dac~]

 

A 440 Hz oscillator, multiplied by 0.5 (50% volume), sent to the audio output (dac~ = digital-to-analog converter).

What distinguishes this paradigm: programming is done via patching (connecting graphical blocks). Visual and modular. Two flows coexist: audio (tilde ~ — processed sample by sample) and control (messages — processed event by event). Dataflow is at the hinge between signal and event: it routes both flows simultaneously. Dataflow analysis is a well-formalized domain in computer science since Kildall’s algorithm (1973).

Examples: Max/MSP (1988), Pure Data (1996), Reaktor, VCV Rack

Level 3: Event-based — “When to play what?”

Music is a sequence of timestamped events: “at time $t$, trigger action $a$.”

 

t=0ms     Note On  C4 velocity=80
t=500ms   Note Off C4
t=500ms   Note On  D4 velocity=75
t=1000ms  Note Off D4

 

Characteristics: the main axis is time. Close to what an instrument physically does (press/release). No musical structure (no phrase, motif, harmony). Optimized for real-time performance.

Strengths Limitations
Real-time transmission No musical structure
Universal (any synthesizer) No C$\sharp$/D$\flat$ distinction
Compact ($\sim$ bytes/note) 128 velocity levels, 16 channels

Formally, MIDI is a Type 3 (regular) language in Chomsky’s hierarchy (L1) — recognizable by a finite automaton. See M1.

Level 4: Notational — “How to write it?”

Music is a document structured according to notation conventions: staves, measures, clefs, key signatures (sharps or flats at the beginning of the staff indicating the key).

 

<measure number="1">
  <attributes>
    <key><fifths>0</fifths></key>
    <time><beats>4</beats><beat-type>4</beat-type></time>
  </attributes>
  <note>
    <pitch><step>C</step><octave>4</octave></pitch>
    <type>quarter</type>
  </note>
</measure>

 

Characteristics: the main axis is the page. Encodes the composer’s intention. Includes expression markings (dynamics, phrasing). Preserves the tradition of the musical score.

Strengths Limitations
Notational completeness $\sim$10x larger than MIDI
Interoperability (Finale, MuseScore) Western bias (standard notation)
Human-readable Static (1 file = 1 fixed score)

Formally, MusicXML is a Type 2 (context-free) language in Chomsky’s hierarchy (L1). See M2.

Vinet’s observation: notation formats actually overlap two levels. Tempo (quarter note = 120) is event-symbolic information, but the indication “con fuoco” is high-level semantic information. This overlap reveals that notation is a composite format, not a “pure” layer.

Level 5: Generative — “How to produce it?”

Music is a process defined by rules: grammars, probabilities, algorithms. The file does not contain “the music” but “how to produce music.”

 

// BP3: generative grammar
gram#1[1] S → _vel(80) C4 _vel(60) {D4,E4} F4 G4
gram#1[2] S → A B
gram#1[3] A → C4 C4 | D4 D4
gram#1[4] B → E4 F4 G4

 

S → ...: $S$ rewrites to the sequence. _vel(80): velocity 80. {D4,E4}: chord. |: random choice.

Each execution can produce a different sequence depending on stochastic (weighted random) choices.

Characteristics: the main axis is transformation. Describes families of pieces, not a unique piece. Non-deterministic. Explicitly encodes hierarchy (themes, motifs, phrases).

Strengths Limitations
One grammar = infinite realizations Requires programming
Explicit structure (hierarchies) No standard exchange
Semantic compactness Impossible to archive “the” version

Formally, BP3 (I2) is a PCFG (Probabilistic Context-Free Grammar — probabilistic context-free grammar, B1) with flags (conditional variables, B4) that push it towards mildly context-sensitive (L9).

This is exactly the gap identified by Vinet: musical patterns and structures occupy a “missing position” between the symbolic and knowledge, which his 4-level model did not capture.

Level 6: Functional / Pattern — “How to compose transformations?”

Music is a composition of functions: patterns (repetitive motifs) transformed by algebraic operations (transposition, inversion, superposition).

 

-- TidalCycles: polyrhythm in one line
d1 $ stack [s "bd sd" # speed 1.2, s "hh*4" # gain 0.8]

 

stack: superimpose two layers (parallel composition). s "bd sd": bass drum + snare drum pattern. s "hh*4": four hi-hats per cycle.

What distinguishes this paradigm: patterns are first-class values — they can be combined, transformed, and passed as arguments. Inheritance from functional programming (Haskell): composition, lazy evaluation (values are only computed when needed), immutability. Proximity to musical algebra: transposition = translation, inversion = symmetry. Ideal for live coding (performative programming in real-time in front of an audience).

Examples: TidalCycles (McLean, 2014), Euterpea (Hudak, 2014), SuperCollider Patterns, Extempore


Conversion Between Levels: The Central Mechanism

Each transition from one level to another is a conversion problem with formalizable properties:

 

graph LR
    S["Signal"] -->|"transcription
(loss)"| E["Event"] E -->|"analysis
(structuring)"| N["Notation"] N -->|"induction
(little explored)"| G["Generative"] G -->|"derivation
(choices)"| N2["Notation"] N2 -->|"playback
(quantization)"| E2["Event"] E2 -->|"synthesis
(add timbre)"| S2["Signal"] style S fill:#f97316,color:#fff style E fill:#06b6d4,color:#fff style N fill:#3b82f6,color:#fff style G fill:#a855f7,color:#fff style S2 fill:#f97316,color:#fff style E2 fill:#06b6d4,color:#fff style N2 fill:#3b82f6,color:#fff

 

Interface Down $\downarrow$ Up $\uparrow$ Loss?
Signal $\leftrightarrow$ Event Synthesis (add timbre) Automatic transcription Yes ($\uparrow$): timbre lost
Event $\leftrightarrow$ Notation Playback (quantization) Score following Partial: expression lost
Notation $\leftrightarrow$ Generative Derivation (random choices) Grammar induction Yes ($\uparrow$): rules lost
Generative $\leftrightarrow$ Functional Pattern instantiation Abstraction Yes ($\uparrow$): transformations lost

Vinet’s insight: most research problems in music technology are problems of inter-level conversion. Automatic transcription? Signal → Event. Synthesis? Event → Signal. Algorithmic composition? Generative → Notation → Event → Signal. And each conversion has a cost.


Multi-Level Tools

The power of a musical tool is often measured by the number of levels it spans. Here is a map of the main tools positioned on our stack:

Tool Signal Dataflow Event-based Notational Generative Functional
Csound $\bullet$ $\bullet$
Faust $\bullet$
Max/MSP $\bullet$ $\bullet$ $\bullet$
Pure Data $\bullet$ $\bullet$ $\bullet$
SuperCollider $\bullet$ $\bullet$ $\bullet$ $\bullet$
Sonic Pi $\bullet$ $\bullet$ $\bullet$
Extempore $\bullet$ $\bullet$ $\bullet$
MIDI (format) $\bullet$
MusicXML $\bullet$
Lilypond $\bullet$
music21 $\bullet$ $\bullet$
OpenMusic $\bullet$ $\bullet$ $\bullet$
BP3 $\bullet$ $\bullet$
TidalCycles (via SC) $\bullet$ $\bullet$
Euterpea $\bullet$ $\bullet$
GTTM (theory) $\bullet$

Observations:

  • Tools that touch only one level (MIDI, MusicXML, Faust) are specialized and very good in their domain, but limited.
  • Tools that span 3-4 levels (SuperCollider, Sonic Pi, Extempore, Max/MSP, OpenMusic) are the most versatile. They pay for this versatility with a higher learning curve.
  • No single tool covers all six levels. There is no universal system for musical representation.
  • SuperCollider and OpenMusic stand out for their coverage of higher levels (generative/functional) in addition to the signal — but they are not the only ones.

Direct Comparison: The Same Motif Across Four Levels

Let’s take a simple motif: C-E-G (arpeggio — notes of a chord played successively — of C major).

Level 3 — MIDI (Event-based)

 

Delta=0    Note On  60 (C4)  vel=80
Delta=480  Note Off 60
Delta=0    Note On  64 (E4)  vel=80
Delta=480  Note Off 64
Delta=0    Note On  67 (G4)  vel=80
Delta=480  Note Off 67

 

Delta = time elapsed since the previous event, in ticks (typically 480 ticks per quarter note).

Captures: exact pitches, timings, velocities. Ignores: it’s an arpeggio, it’s a C major chord, it’s a tonic degree.

Level 4 — MusicXML (Notational)

 

<measure number="1">
  <note>
    <pitch><step>C</step><octave>4</octave></pitch>
    <duration>1</duration><type>quarter</type>
  </note>
  <note>
    <pitch><step>E</step><octave>4</octave></pitch>
    <duration>1</duration><type>quarter</type>
  </note>
  <note>
    <pitch><step>G</step><octave>4</octave></pitch>
    <duration>1</duration><type>quarter</type>
  </note>
</measure>

 

Captures: the key (C major), the time signature, the note names. Ignores: it’s an arpeggio (typical figure), it’s a tonic function.

Level 5 — BP3 (Generative)

 

gram#1[1] ARPEGGIO → _vel(80) C4 E4 G4
gram#1[2] ARPEGGIO → _vel(80) C4 G4 E4      // inversion
gram#1[3] ARPEGGIO → _vel(80) G4 E4 C4      // descending
gram#1[4] S → ARPEGGIO CADENCE

 

Captures: the notion of arpeggio as a structural unit, possible variations. Ignores: the exact realization chosen at each execution.

Multi-level — SuperCollider (3 $\to$ 5)

 

Pbind(
  \instrument, \default,
  \note, Pseq([0, 4, 7]),    // C-E-G in chromatic degrees
  \dur, 0.5,
  \amp, 0.8
).play;

 

Pbind: binding of musical parameters (functional paradigm). Pseq([0, 4, 7]): ordered sequence ($0 = \text{C4}$, $4 = \text{E4}$, $7 = \text{G4}$). \instrument, \default: default SynthDef (signal paradigm).

Captures: the structure (Pbind), the timbre (SynthDef), the timing (dur). Ignores: traditional notation.


The Fundamental Compromise

No format can be optimal on all axes. It’s an irreducible compromise:

 

graph TD
    EX["Exhaustiveness
(encode everything)"] CO["Compactness
(little data)"] GE["Generativity
(produce variations)"] EX --- CO CO --- GE GE --- EX MIDI((MIDI)) -.-> CO XML((MusicXML)) -.-> EX BP3((BP3)) -.-> GE style MIDI fill:#06b6d4,color:#fff style XML fill:#3b82f6,color:#fff style BP3 fill:#a855f7,color:#fff

 

Imagine we want to describe a forest:

  • Exhaustiveness: describe every tree, every leaf $\to$ enormous amount of data
  • Compactness: say “an oak forest” $\to$ very short, but details are lost
  • Generativity: give growth rules $\to$ can produce infinitely many forests, but none is “the” original forest
Format Exhaustiveness Compactness Generativity
MIDI Medium High None
MusicXML High Low None
BP3 Low Very High High
SuperCollider Variable Medium High
TidalCycles Low High High
Csound High (signal) Low Low

Why BP2SC Crosses Layers

The BP2SC project transpiles (converts source code from one language to another) BP3 grammars into SuperCollider patterns. This is not a horizontal conversion between formats of the same level — it’s a vertical traversal:

 

graph TD
    L5["Level 5: Generative
BP3 grammar"] -->|"derivation"| L4["Level 4–5: Structure
derivation tree"] L4 -->|"linearization"| L3["Level 3: Event-based
note sequence (Pbind)"] L3 -->|"SC execution"| L1["Level 1: Signal
audio (SynthDefs)"] style L5 fill:#a855f7,color:#fff style L4 fill:#7c3aed,color:#fff style L3 fill:#06b6d4,color:#fff style L1 fill:#f97316,color:#fff

 

BP2SC skips the notational level (no intermediate score) — which is only possible because SuperCollider operates at the generative (Patterns), event-based (OSC), and signal (SynthDefs) levels simultaneously.

Why not simply export to MIDI?

  1. MIDI is deterministic: a MIDI file = a fixed sequence. BP3 describes families of sequences.
  2. MIDI has no rules: you cannot encode “choose between A and B with 70/30 probability” in MIDI.
  3. SC preserves generativity: SC patterns — Pbind, Pwrand (weighted choice), Pseq (sequence) — express the same concepts as BP3.

 

// SuperCollider equivalent of a weighted BP3 choice
Pwrand(
  [Pseq([60, 64, 67]), Pseq([60, 67, 64])],
  [0.7, 0.3],  // 70% ascending, 30% inversion
  inf
)

 

This conversion preserves the generative intention — something a MIDI export could never do.


Towards an “OSI Model” for Music?

The analogy with the OSI model (Open Systems Interconnection — the 7-layer model that structures network protocols) is not trivial. The OSI model succeeds because it formalizes:

  • Layers with clearly separated responsibilities
  • Interfaces: each layer communicates only with its neighbors
  • Encapsulation: data from an upper layer is “wrapped” by the lower layer

For music, such a model would allow us to:

  • Predict which conversions are easy and which are lossy
  • Characterize tools by the layers they span
  • Identify gaps: which standards are missing at which levels? (Vinet already identified a “missing control format” between signal and event in 2003)

This formalization effort goes beyond the scope of this article — but it underpins the entire M series. The map we have drawn here is a first step towards this model.


Key Takeaways

  1. Six paradigms, six levels of abstraction: Signal (1) $\to$ Dataflow (2) $\to$ Event-based (3) $\to$ Notational (4) $\to$ Generative (5) $\to$ Functional (6).
  2. Moving up = abstracting, moving down = concretizing. Each conversion between levels has a cost (loss or addition of information).
  3. Multi-level tools (SuperCollider, Sonic Pi, Extempore, Max/MSP, OpenMusic) are the most powerful — but none covers all six levels.
  4. The impossibility triangle: exhaustiveness, compactness, and generativity cannot be maximized simultaneously.
  5. BP2SC is a vertical traversal: from generative (BP3) to signal (SC), skipping notation.
  6. Vinet’s framework (2003) provides the foundations; the gap he identifies at the syntactic/patterns level is exactly the space where BP3 and TidalCycles operate.

To Go Further

  • Vinet (2003): “The Representation Levels of Music Information” — the foundational 4-level framework, CMMR 2003, LNCS 2771. DOI:10.1007/978-3-540-39900-1_17
  • IEEE 1599: Baratè, Haus & Ludovico (2019) — 6-layer multilayer standard. DOI:10.1109/MMRP.2019.8665381
  • Symbolic Formats: Selfridge-Field (1997) Beyond MIDI: The Handbook of Musical Codes
  • Musical DSLs: McLean & Dean (2018) The Oxford Handbook of Algorithmic Music
  • Faust: Orlarey, Fober & Letz (2004) “Syntactical and Semantical Aspects of Faust”
  • TidalCycles: McLean (2014) Making Programming Languages to Dance to
  • MIDI: midi.org | MusicXML: w3.org/2021/06/musicxml40
  • BP3: bolprocessor.org | SuperCollider: supercollider.github.io

Glossary

  • Arpeggio: notes of a chord played successively rather than simultaneously
  • BP3: Bol Processor 3 — musical grammar software developed by Bernard Bel (see I2)
  • DAW: Digital Audio Workstation — music production software (Ableton Live, Logic Pro, FL Studio)
  • Dataflow: paradigm where computation is modeled as a graph of data flows between processing nodes
  • Deterministic: a system that always produces the same result with the same inputs
  • DSL: Domain-Specific Language — a programming language dedicated to a particular domain
  • GTTM: Generative Theory of Tonal Music — Lerdahl and Jackendoff’s theory modeling musical perception
  • IEEE 1599: multilayer standard for musical representation with 6 synchronized XML layers
  • Live coding: a performative programming practice where code is written and modified in real-time in front of an audience
  • OSI Model: Open Systems Interconnection — a 7-layer model structuring network protocols
  • Pattern: in SuperCollider or TidalCycles, an object that generates sequences of values according to rules
  • PCFG: Probabilistic Context-Free Grammar — context-free grammar enriched with probabilities (see B1)
  • Stochastic: weighted random — each choice has an associated probability
  • SynthDef: in SuperCollider, a synthesizer definition as a graph of audio generators
  • Transpile: to convert code from one language to another at the same level of abstraction
  • Velocity: in MIDI, the intensity of a note (0-127), often interpreted as volume

Links in the Series

  • M1 — MIDI under the Formal Microscope — the Event-based Paradigm (Level 3)
  • M2 — MusicXML under the Formal Microscope — the Notational Paradigm (Level 4)
  • M4 — Grammars and Music — Deep Dive into the Generative Paradigm (Level 5)
  • I2 — Bol Processor — BP3, Central Generative System
  • I3 — SuperCollider — Multi-Level Tool (1-3-5-6)
  • I4 — Introduction to MIDI — the Event-based Protocol
  • I5 — Introduction to MusicXML — the Notational Format
  • L1 — Chomsky Hierarchy — Formal Framework for Grammars
  • L5 — The Three Semantics — Complementary Perspectives on Programs
  • L11 — Beyond the Three Semantics — The Network of Semantics
  • B1 — Probabilistic Grammars — Formal Foundations of BP3
  • B7 — The BP2SC Transpiler — Vertical Traversal of Levels

Prerequisites: M1, M2, L5
Reading time: 18 min
Tags: #paradigms #musical-representation #abstraction-levels #midi #musicxml #generative #supercollider #signal #dataflow #functional #vinet


Next article: M4 — Grammars and Music