I4) Introduction to MIDI
The Protocol That Makes Instruments Speak
When you press a synthesizer key, what really happens? And why does MIDI, after 40 years, remain as essential as it is limited?
Where This Article Fits In
This article concludes the Introduction (I) series. After presenting the project (I1), the Bol Processor (I2) and SuperCollider (I3), we now explore the protocol that underpins all digital music. MIDI is one of the reference formats for computer music representation — understanding its principles and limitations is a prerequisite for appreciating the alternative representation choices we will see in the Music (M) series.
For a formal analysis of MIDI — its position in the Chomsky hierarchy (regular language, L1), existing formalization attempts, and the contrast with BP3 — see M1 (coming soon).
Why It’s Important
MIDI (Musical Instrument Digital Interface) is everywhere. Your USB keyboard, your DAW, your VST plugins — they all speak MIDI. Created in 1983 to connect synthesizers to each other, this protocol has become the lingua franca of electronic music.
Terms to know:
– DAW (Digital Audio Workstation): music production software like Ableton Live, Logic Pro, FL Studio, or Reaper. It’s your “virtual studio” for recording, editing, and mixing.
– VST Plugin (Virtual Studio Technology): a software extension that adds virtual instruments or effects to your DAW. A piano plugin, for example, receives MIDI messages and generates the corresponding sound.
– Protocol: a set of rules defining how two systems communicate. MIDI is a communication protocol between instruments.
But MIDI was designed for hardware constraints from 40 years ago. Understanding its strengths and limitations is essential for anyone working with digital music — whether for producing, programming, or evaluating the representation choices of other systems.
The Idea in One Sentence
MIDI does not transmit sound, but instructions: “play this note, at this strength, now.”
How Does MIDI Work?
The Fundamental Principle: Messages, Not Sound
Unlike an audio file (WAV, MP3) which contains the sound’s waveform, a MIDI file only contains instructions. It’s like the difference between:
- A musical score (instructions) → MIDI
- An audio recording (actual sound) → WAV/MP3
A synthesizer receives MIDI messages and generates the corresponding sound. The same MIDI file can sound completely different depending on the instrument playing it.
Essential Messages
Note On / Note Off
The heart of MIDI: when to play and when to stop a note.
Note On : channel=1, note=60 (middle C), velocity=100
Note Off : channel=1, note=60, velocity=0
Analogy with a pianist:
– Note On = the finger presses the key (the note begins)
– Note Off = the finger releases the key (the note stops)
– Velocity = the force with which the finger strikes (pianissimo to fortissimo)
Without the Note Off message, the note would continue indefinitely (as if you held the key down).
Anatomy of a Note On message:
- Channel (1-16): a logical identifier allowing control of 16 different instruments on the same cable. Like 16 independent “phone lines.”
- Note (0-127): the MIDI number of the note. 60 = middle C (C4 in Anglo-Saxon notation). Each semitone adds 1: 61 = C#, 62 = D, etc.
- Velocity (0-127): the intensity of the strike. 0 = silence (equivalent to Note Off), 64 = mezzo-forte, 127 = fortissimo.
Why “velocity” and not “volume”?
On a real piano, it’s the speed (velocity) of the hammer that determines both volume AND timbre. A fast strike produces a louder and brighter sound. Synthesizers reproduce this behavior: velocity can affect volume, timbre, or both.
Control Change (CC)
All other controls: volume, pan, sustain, modulation…
CC : channel=1, controller=7 (volume), value=100
CC : channel=1, controller=64 (sustain pedal), value=127 (pressed)
What is a Control Change?
A CC message modifies a continuous parameter of the instrument, distinct from the notes themselves. Imagine the knobs and pedals on a synthesizer: each controls an aspect of the sound (volume, brightness, reverberation…).
Each controller has a number (0-127) and a value (0-127). The number identifies WHICH parameter, the value defines its position.
Some common CCs:
| CC# | Function | Explanation |
|—–|———-|————-|
| 1 | Modulation wheel | Adds vibrato or other expressive effects |
| 7 | Volume | Overall sound level of the channel |
| 10 | Pan | Left (0) / center (64) / right (127) position |
| 64 | Sustain pedal | Like the piano’s damper pedal (0-63 = up, 64-127 = down) |
| 91 | Reverb | Amount of reverberation effect |
Program Change
Changes the instrument (the “program” or “patch”).
Program Change : channel=1, program=25 (acoustic guitar in General MIDI)
Vocabulary:
– Program / Patch / Preset: these terms are synonymous and refer to a predefined instrument sound (piano, guitar, strings…).
– General MIDI (GM): a standard defining 128 numbered sounds that are identical across all compatible devices. Program 1 = acoustic piano, 25 = acoustic guitar, 41 = violin, etc. This ensures that a MIDI file sounds “roughly the same” on different devices.
Pitch Bend
Varies the pitch of the note (vibrato effect, glissando — continuous slide from one note to another).
Pitch Bend : channel=1, value=8192 (neutral), range=-8192 to +8191
What is Pitch Bend?
It’s the equivalent of the “pitch wheel” on a synthesizer, or a guitar bend: you pull the note up or down to create an expressive effect.
- Neutral value (8192): no pitch modification
– Values < 8192: lower note (down to 0 = maximum down)
– Values > 8192: higher note (up to 16383 = maximum up)
The pitch bend range (how many semitones maximum) is configurable on the instrument, typically +/- 2 semitones.
Resolution: The 7-Bit Problem
MIDI was designed when processors were slow and memory was scarce. Result: almost all values are encoded using 7 bits (0-127).
What is a bit?
A bit (binary digit) is the smallest unit of information: 0 or 1.
– 7 bits can represent 2^7 = 128 different values (0 to 127)
– 8 bits (1 byte) can represent 2^8 = 256 values
– 14 bits can represent 2^14 = 16384 values (used for Pitch Bend)
The more bits you have, the more precision you get. 128 velocity levels seem like a lot, but for a volume fader that you move slowly, it’s not very smooth.
Practical Consequences
| Parameter | MIDI Range | Limitation |
|———–|————|————|
| Velocity | 0-127 | 128 levels of nuance |
| Controllers | 0-127 | 128 positions per CC |
| Notes | 0-127 | ~10.5 octaves (sufficient) |
Is 128 levels a lot or a little?
For velocity: often sufficient. The difference between 100 and 101 is imperceptible.
For a volume fader: problematic. When you slowly raise a fader, you hear “stair steps” instead of a smooth curve. This is why modern controllers often use 14-bit CCs (CC 0-31 combined with CC 32-63).
Timing and Synchronization
The Timing Problem in MIDI Files
A Standard MIDI File (SMF) stores events with timestamps in “ticks.” But how many ticks per quarter note?
What is a tick?
A tick is the smallest unit of time in a MIDI file. It’s like a “clock pulse” that divides musical time. The more ticks per quarter note, the more precise the timing.
- PPQ (Pulses Per Quarter note): number of ticks per quarter note. Temporal resolution, typically 96, 480, or 960. “Pulses” = impulses, “Quarter note” = 1/4 of a whole note.
- Tempo: a meta-event that defines the duration of a quarter note in microseconds.
PPQ = 480
Tempo = 500000 µs/quarter note = 120 BPM (Beats Per Minute)
One quarter note = 480 ticks
One eighth note = 240 ticks
One sixteenth note = 120 ticks
Where does the calculation 500000 µs = 120 BPM come from?
– BPM = beats (quarter notes) per minute
– At 120 BPM, there are 120 quarter notes in 60 seconds
– So 1 quarter note = 60/120 = 0.5 seconds = 500000 microseconds
– Formula: Tempo (µs/quarter note) = 60000000 / BPM
Real-time Timing
In a direct connection (keyboard to synth), MIDI transmits at 31.25 kbit/s (kilobits per second). A Note On message takes 3 bytes = 960 microseconds, or about 1 millisecond.
Where does this calculation come from?
– 1 MIDI byte = 10 bits transmitted (8 data bits + 1 start bit + 1 stop bit)
– 3 bytes = 30 bits
– At 31250 bits/second: 30 bits / 31250 = 0.00096 seconds = 960 µs
Consequence: If you play a 4-note chord simultaneously, they actually arrive offset by about 1ms each (total: 3-4 ms delay between the first and last note). In practice, this is imperceptible to the human ear (which distinguishes differences starting from about 20-30 ms), but theoretically, MIDI cannot represent true simultaneity.
The 16 Channels: Strength and Limitation
The Original Architecture
One MIDI cable = 16 channels = maximum 16 instruments.
This was revolutionary in 1983 for connecting multiple synths. But today:
- A symphony orchestra has far more than 16 parts
- Each channel can only have one active pitch bend (a problem for polyphonic instruments — capable of playing multiple notes simultaneously, like a piano — which require individual expression per note)
Modern Workaround: MPE
MPE (MIDI Polyphonic Expression) uses one channel per note to allow individual pitch bend and pressure. A 4-note chord uses 4 channels — which quickly consumes the 16 available channels.
Why is MPE revolutionary?
In classic MIDI, pitch bend applies to ALL notes on a channel. It’s impossible to “bend” a single note of a chord up while the others remain stable.
With MPE, each note lives on its own channel. Controllers like Expressive E’s Osmose or the Linnstrument use MPE to capture note-by-note expressive gestures: slides, vibrato, pressure… This is the closest thing to playing a violinist or a singer.
The cost: MPE monopolizes almost all 16 channels (1 global channel + up to 15 note channels). This leaves no free channels for other instruments on the same MIDI port.
What MIDI CANNOT Do
1. No Musical Structure
MIDI doesn’t know what a “phrase,” a “motif,” or a “theme” is. It only sees individual notes.
MIDI sees: Note 60, Note 64, Note 67
Human sees: C major chord, first degree, tonic function
2. No Native Microtonality (Intervals Smaller Than a Semitone)
The 128 MIDI notes correspond to the 12-semitone equal temperament. To play an Arabic or Indian scale with quarter-tone intervals, you need to:
- Use pitch bend (but per channel, not per note)
- Use synthesizers compatible with alternative tunings
What is equal temperament?
It’s the standard tuning system of modern Western music: the octave is divided into 12 equal semitones. Each semitone has a frequency ratio of 2^(1/12) (approximately 1.059).
But many musical traditions use other systems:
– Arabic music: quarter tones (24 divisions per octave)
– Indian music: shruti (22 divisions)
– Baroque music: unequal temperaments (Werckmeister, etc.)
MIDI cannot natively represent these micro-intervals because its 128 notes are “fixed” to equal temperament.
3. No Continuous Nuances
Velocity is defined at the moment of attack. It’s impossible to create a crescendo on a sustained note (you need to use CC 11 Expression or aftertouch).
What is aftertouch?
It’s the pressure exerted on a key after the initial attack. Some keyboards detect this pressure and send corresponding MIDI messages.
– Channel Aftertouch: a single pressure value for the entire channel
– Polyphonic Aftertouch: a value per note (more expressive, but rare)
Aftertouch can modulate volume, vibrato, or other parameters to add expression to sustained notes.
4. No Notation
MIDI doesn’t know if you’re playing a C sharp or a D flat. No clef, no measure, no key signature. This is why MIDI to score conversion often yields strange results.
The problem of enharmonics
In MIDI, C# and D♭ are the same note (number 61). But on a score, it’s not the same:
– In D major, you write C#
– In E♭ major, you write D♭
MIDI to score transcription software must “guess” the correct spelling, which often leads to aberrations like D# in an F major key.
Key Takeaways
- MIDI = instructions, not audio: It’s a digital score, not a recording.
- Fundamental messages: Note On/Off, Control Change, Program Change, Pitch Bend.
- 7-bit resolution (0-127): Sufficient for notes, limited for fine controllers.
- 16 channels: Sufficient for many uses, but constrained for complex music.
- What’s missing: Musical structure, native microtonality, notation, continuous nuances.
- Still relevant: After 40 years, MIDI remains the de facto standard thanks to its simplicity and universality.
To Go Further
- Official Specification: MIDI Association
- MIDI 2.0: New version (2020) with 32-bit resolution and bidirectionality
Glossary
- Aftertouch: pressure exerted on a key after the initial attack. Allows adding expression to sustained notes (vibrato, crescendo…).
- Bit: elementary unit of information (0 or 1). 7 bits = 128 possible values (0-127).
- BPM: Beats Per Minute. Measure of tempo. 120 BPM = 120 quarter notes per minute.
- MIDI Channel: logical subdivision (1-16) allowing different instruments to be addressed on the same cable. Like 16 independent phone lines.
- Control Change (CC): message to modify continuous parameters (volume, pan, sustain…). Identified by a number (0-127) and a value (0-127).
- DAW: Digital Audio Workstation. Music production software (Ableton, Logic, FL Studio…).
- Enharmonic: in music, two notes of the same pitch but different spelling (C# = D♭). MIDI does not distinguish enharmonics.
- General MIDI (GM): standard defining 128 identically numbered sounds on all compatible devices.
- MPE: MIDI Polyphonic Expression. Extension allowing one channel per note for individual expression.
- Note On/Off: messages indicating the start (Note On) and end (Note Off) of a note.
- Patch / Program / Preset: predefined instrument sound. Program 1 = piano in General MIDI.
- Pitch Bend: message continuously varying the pitch of a note (glissando, vibrato).
- PPQ: Pulses Per Quarter note. Temporal resolution of a MIDI file. 480 PPQ = 480 ticks per quarter note.
- Protocol: set of rules defining how two systems communicate.
- SMF: Standard MIDI File. MIDI file format (.mid).
- Equal Temperament: tuning system dividing the octave into 12 equal semitones. Standard of modern Western music.
- Tick: smallest unit of time in a MIDI file. The number of ticks per quarter note is defined by the PPQ.
- Velocity: intensity of a note (0-127), corresponding to the “strike force.” Generally affects volume and/or timbre.
- VST: Virtual Studio Technology. Audio plugin format allowing virtual instruments and effects to be added to a DAW.
Links
- M1 (coming soon) — MIDI under the formal microscope — the formal analysis this article does not cover
- M2
- M5
Prerequisites: I1, I2, I3
Reading time: 12 min
Tags: #midi #protocol #audio #musical-representation
Next article: M1