M1) MIDI under the Formal Microscope
A Protocol at the Foot of Chomsky’s Hierarchy
What does a formal analysis of the world’s most widely deployed musical standard reveal?
Where does this article fit in?
This article opens the Music (M) series. If you have read I4, you know how MIDI works — messages, channels, velocity, timing. Here, we change perspective: we no longer ask “how does it work” but “what is it, formally?”
To do this, we use the tools of formal languages introduced in L1: Chomsky’s hierarchy, automata, grammars. The goal is to place MIDI within this hierarchy, understand what this position implies, and discover a surprising fact: after 40 years of existence and billions of files produced, MIDI has never received a complete formal specification — only scattered fragments.
MIDI through the Lens of Formal Languages
A Protocol = A Language
Any communication protocol can be analyzed as a formal language. It has:
- An alphabet: the set of possible symbols. For MIDI, these are the 256 values of a byte (0x00 to 0xFF).
- A syntax: the rules that determine which sequences of bytes are valid. For MIDI, a Note On message must have a status byte followed by two data bytes.
- A semantics: the meaning of the messages. For MIDI, status byte 0x90 means “note start on channel 1”.
The question is: what type of language is MIDI? Can we write a grammar that describes it? If so, at what level of Chomsky’s hierarchy (L1) does this grammar reside?
Two Levels of Abstraction
Before diving in, let’s note that MIDI can be analyzed at two levels of syntactic abstraction:
- The binary level: the raw bytes on the cable — status bytes, data bytes, VLQ encodings. This is the format level.
- The command level: structured messages — Note On(channel, note, velocity), Control Change(channel, controller, value), Program Change(channel, program). This is the protocol level.
The MIDI specification (MMA) defines the correspondence between the two: such a byte pattern corresponds to such a command. We can write a grammar at each level — but the questions asked are different.
At the binary level, the problem is well understood: status bytes, data bytes, running status. This is the subject of this article, and the conclusion is clear: regular language (Type 3).
At the command level, the situation is richer and more open. Each individual command has a finite vocabulary and a fixed arity — so far, it’s regular. But the protocol also imposes sequencing constraints between commands:
- A Note On should (eventually) be followed by a corresponding Note Off (same channel, same note) — a matching problem.
- NRPN sequences require a precise order: CC 99 → CC 98 → CC 6 (→ CC 38).
- A bank change requires CC 0 → CC 32 → Program Change.
- SysEx messages encapsulate proprietary sub-protocols with their own internal structures.
Do these sequencing constraints form a regular, context-free, or more complex language? Note On/Off matching, for example, remains technically regular (128 notes × 16 channels = finite state), but the automaton would be immense. For SysEx sub-protocols, the question is open. No one has formally studied MIDI at this level of abstraction — nor asked about its position in the hierarchy, nor proposed a grammar. This is an aspect of the gap we identify later in this article.
The MIDI Alphabet: Status Bytes and Data Bytes
MIDI makes an elegant binary distinction between two types of bytes:
- Status bytes: most significant bit is 1 (
1xxxxxxx, values 128-255). They identify the message type and channel. - Data bytes: most significant bit is 0 (
0xxxxxxx, values 0-127). They carry the message parameters.
A single bit is enough to distinguish “I am a command” from “I am data”. This is a remarkably simple structure — and it is this simplicity that will determine MIDI’s position in the hierarchy.
Three Levels of Analysis
MIDI can be analyzed at three increasing levels of abstraction, each corresponding to a different question:
- Messages — the byte stream on the cable. What type of formal language does this stream form?
- The SMF file — the structure that stores these messages on disk. What does it say about Chomsky’s hierarchy?
- Musical content — what these messages musically represent. What expressive power results?
Let’s explore these three levels.
Level 1: Messages — A Regular Language
A Finite Automaton Suffices
Let’s take an individual MIDI message, for example, a Note On. Its structure is:
[Status byte: 1001cccc] [Data byte 1: 0nnnnnnn] [Data byte 2: 0vvvvvvv]
type + channel note (0-127) velocity (0-127)
This is a fixed-length sequence, determined by the status byte. There is no recursion, no nesting, no complex context. A finite automaton (the simplest machine in Chomsky’s hierarchy) can recognize any MIDI message.
Reminder (L1): A finite automaton is a machine with a finite number of states that reads an input symbol by symbol and decides if the sequence is valid. No external memory, no stack, no tape — just states and transitions.
Running Status: A State Machine
Running status is a MIDI protocol optimization (I4): if the status byte is identical to the previous one, it can be omitted. The parser must therefore “remember” the last received status.
This adds an internal state to the automaton. But since there are only a finite number of possible status bytes (approximately 128 message types × 16 channels), this state can be encoded directly into the automaton’s states. The resulting automaton is larger (about 1000 states instead of 10), but it remains finite.
State: IDLE
→ Receives status byte 0x90 → State: NOTE_ON_CH1_WAIT_NOTE
→ Receives data byte (active running status) → uses last memorized status
State: NOTE_ON_CH1_WAIT_NOTE
→ Receives data byte (note) → State: NOTE_ON_CH1_WAIT_VELOCITY
State: NOTE_ON_CH1_WAIT_VELOCITY
→ Receives data byte (velocity) → State: COMPLETE_MESSAGE → return IDLE
Conclusion: The language of MIDI messages is a regular language — Type 3 in Chomsky’s hierarchy. This is the lowest level, that of regular expressions and finite automata.
Level 2: The SMF File — A Context-Free Structure?
The Block Structure
A Standard MIDI File (SMF) adds a structural layer:
File = Header + Track 1 + Track 2 + ... + Track n
Header : "MThd" + length + format + number of tracks + division
Track : "MTrk" + length + sequence of events
This nested structure (a file contains tracks, each track contains events) resembles a context-free grammar (Type 2, L1). The SMF 1.1 specification also provides a pseudo-BNF (Backus-Naur Form):
<midi-file> = <header-chunk> <track-chunk>+
<header-chunk> = "MThd" <length=6> <format> <ntrks> <division>
<track-chunk> = "MTrk" <length> <track-event>+
Variable-Length Quantities: A Complicating Detail
Delta-times (intervals between events) use a Variable-Length Quantity (VLQ) encoding: the most significant bit of each byte indicates whether the next byte is part of the same value. It’s like a system where “1” at the beginning of a byte means “keep reading” and “0” means “this is the last one”.
Technically, VLQ exceeds the capabilities of a simple finite automaton (it needs to “count” continuation bytes). But a pushdown automaton (the model associated with context-free grammars) handles this without problem.
No Real Recursion
However, the SMF structure has no recursion: a track does not contain other tracks, an event does not contain other events. It is a flat two-level structure (file → tracks → events), not an arbitrarily deep tree.
Conclusion: The SMF format is at most context-free (Type 2), but it is a very weak Type 2 — analyzable by a trivial LL(1) parser, without the recursive power that makes context-free grammars interesting.
Level 3: Musical Content — Zero Generative Power
Here is the crucial point for our project: MIDI as a musical representation has no generative power.
A MIDI file describes a specific sequence of events: “at t=0, play C; at t=480, play E; at t=960, play G”. It cannot say:
- “Play this motif 3 times” (you have to copy-paste the events)
- “Randomly choose between two variations” (no stochasticity)
- “Nest these two temporal flows in a 3:4 ratio” (no polymetry)
In terms of formal languages, a MIDI file is a word of the language, not a grammar. It describes a result, not a process that generates results.
The Contrast with BP3
| Aspect | MIDI (Type 3) | BP3 (Type 2 → Turing) |
|---|---|---|
| Nature | Word (finite sequence) | Grammar (set of rules) |
| Recursion | None | Yes (recursive rules) |
| Stochasticity | None | Yes (weights, RND/SUB modes) |
| Polymetry | Impossible | Native (ratio + parallel) |
| Structure | Flat (sequence of events) | Hierarchical (derivation tree) |
| Power | Description | Generation |
| Hierarchy | Type 3 (regular) | Type 2 → Turing-complete (L1, L9) |
The BP3 → MIDI transition is a projection: the derivation tree is flattened into a linear sequence. Two things are lost:
- Structural information: which rule produced which note, which motifs are recurrent, how temporal flows articulate — all of this disappears.
- Expressive power: a MIDI file is one particular realization. Stochasticity (weights, RND mode), controlled recursion (SUB mode), probabilistic alternatives — everything that makes BP3 a space of possible music is reduced to a single point in that space. Replaying the MIDI file always gives the same result; re-running the BP3 grammar can give a different one.
It’s like going from a Python program (which contains loops, functions, conditions, and random draws) to a particular execution trace. The trace allows neither to reconstruct the program nor to reproduce its variability.
The State of Formalizations: An Almost Untouched Field
The SMF Pseudo-BNF: The Only Official Formal Fragment
The SMF 1.1 specification (MMA, 1999) contains a few lines of pseudo-BNF — this is the only attempt at formalization in the official specification. It covers the file structure, but not the syntax of MIDI messages themselves, and certainly not their semantics.
Kaitai Struct: The Closest to a Formal Spec
The Kaitai Struct project offers a complete declarative description of the SMF format in YAML. This is probably the most rigorous specification that exists: it covers the entire binary format and can generate parsers in 11 languages. But Kaitai describes the binary structure, not the language in Chomsky’s sense.
Euterpea/Haskore: Algebraic Formalization in Haskell
The Euterpea project (Hudak, 2014) defines music as a recursive algebraic data type in Haskell, with a sequential operator (:+:) and a parallel operator (:=:). Export to MIDI is a function that flattens this recursive structure into MIDI events. Euterpea does not formalize MIDI — it formalizes an abstraction above MIDI. This is exactly the BP3 approach.
MIDI Linked Data: 10 Billion RDF Triples, But Not the Protocol
The MIDI-LD project (Meroño-Peñuela & Hoekstra, 2017) converted 170,000 MIDI files into 10.2 billion RDF triples — subject–predicate–object triplets (e.g., <Note42> <hasPitch> <C4>) that form the basic building block of the Semantic Web. This is an ontological formalization (what does a MIDI event mean?) but not syntactic (how is a MIDI message structured?).
RFC 6295: An ABNF Grammar… Serving the Network
RFC 6295 (Lazzaro & Wawrzynek, 2011) specifies the encapsulation of MIDI in network packets (RTP). To do this, it includes an ABNF (Augmented BNF) grammar of MIDI messages — the most rigorous syntactic description that exists. But this grammar was not written to formalize MIDI: it is a byproduct of a network specification. It covers the structure of messages, not the semantics of the protocol (what does a Control Change 64 mean?), and it is buried in an RFC that no one in the music community consults.
In other words: a fragment of a formal specification exists, but it is partial, decontextualized, and has never been recognized or adopted as a reference specification for MIDI itself.
Why This Gap?
Four reasons converge:
- Industrial origin: MIDI comes from the music industry, not academia. No tradition of formalization.
- Syntactic simplicity: A regular language is so simple that a formal grammar seems superfluous.
- Binary format: Classic formalisms (BNF, EBNF) are designed for textual languages, not binary ones.
- Lack of criticality: A MIDI parsing error produces a wrong note, not a security flaw. No motivation for formal verification.
What It Reveals
Simplicity = Strength AND Limitation
MIDI’s position at the foot of Chomsky’s hierarchy explains both its universality and its limitations:
- Universality: A regular language is trivial to parse. Any 8-bit microcontroller from 1983 can read MIDI. This is why the protocol has been adopted everywhere, from synthesizers to DAWs to lighting controllers and art installations.
- Limitation: A regular language cannot express hierarchy. MIDI cannot represent a “musical phrase” composed of “motifs” themselves composed of “notes” — it only sees individual notes in time. This is why systems that seek to analyze or generate structured music (GTTM, BP3, TidalCycles) operate above MIDI.
This is not a design flaw — it is a structural consequence of choosing a regular language. A deliberate, pragmatic choice, brilliantly adapted to the constraint of 1983: making synthesizers communicate with an 8-bit microprocessor and a serial cable.
To Go Further in the Series
- M2 (coming soon) — MusicXML under the formal microscope: a context-free format (Type 2) with a complete grammar
- M5 — Polymetry: the concept that MIDI cannot structurally express
- M7 — The landscape of musical DSLs: how other systems circumvent MIDI’s limitations
- B1 — BP3: a generative system that operates beyond Type 2
Key Takeaways
- MIDI is a regular language (Type 3): its messages can be recognized by a finite automaton, the simplest machine in Chomsky’s hierarchy.
- The SMF format is at most context-free (Type 2): the chunk structure adds a level, but without real recursion.
- MIDI has no generative power: it describes sequences, it does not generate them. It is a word, not a grammar.
- MIDI has never been formally specified as such: despite 40 years of existence, there is no complete grammar or formal semantics of the protocol. Fragments exist (SMF pseudo-BNF, RFC 6295 ABNF), but none were designed as a reference specification for MIDI itself. This is an identified research gap.
- MIDI’s simplicity is both its strength and its limitation: a regular language is universal and trivial to parse, but it cannot express hierarchical structure.
- BP3 operates at a fundamentally higher level: where MIDI describes, BP3 generates. The BP3 → MIDI export is a projection that loses structural information.
Glossary
- Finite automaton: a machine with a finite number of states that reads an input symbol by symbol. Recognizes exactly regular languages (Type 3). See L1.
- BNF / EBNF (Backus-Naur Form / Extended BNF): notation for writing formal grammars. Used to specify the syntax of programming languages. See L3.
- Context-free: a type of grammar (Type 2) where each rule replaces a single symbol, independently of the context. Allows recursion and nesting. See L2.
- Data byte: a MIDI byte whose most significant bit is 0 (values 0-127). Carries message parameters (note, velocity, CC value…).
- Grammar: a set of rewrite rules that define a language. Not to be confused with a word of the language.
- Chomsky hierarchy: classification of formal languages into 4 types (Type 3 → Type 0) according to the increasing expressive power of the necessary grammars. See L1.
- Regular language: a language recognizable by a finite automaton (Type 3). Individual MIDI messages form a regular language.
- Running status: a MIDI optimization where the status byte can be omitted if it is identical to the previous one. Requires the parser to remember the last status.
- SMF (Standard MIDI File): MIDI file format (.mid). Chunk structure (header + tracks).
- Status byte: a MIDI byte whose most significant bit is 1 (values 128-255). Identifies the message type and channel.
- VLQ (Variable-Length Quantity): variable-length integer encoding used in the SMF format for delta-times.
Links in the Series
Introduction Series (prerequisites):
- I4 — Introduction to MIDI: the protocol that makes instruments talk
Formal Languages Series (theoretical tools):
- L1 — Chomsky Hierarchy: the classification framework used here
- L2 — Context-Free Grammars: the level MIDI doesn’t quite reach
- L3 — EBNF: the notation MIDI never had
Continuation of the Music Series:
- M2 (coming soon) — MusicXML under the formal microscope
- M3 — The three paradigms of musical representation
- M5 — Polymetry: what MIDI cannot express
BP3 Series:
- B1 — Probabilistic Grammars: BP3 starts at Type 2
Glossary:
- Glossaire — General Glossary of the series
Prerequisites: I4, L1
Reading time: 15 min
Tags: #midi #formal-languages #chomsky #automaton #grammar #formalization
Next article: M2 — MusicXML under the formal microscope