I5) MusicXML

The Universal Digital Score

When you open a Finale score in MuseScore, how does the musical information survive the journey? And why do some details sometimes disappear along the way?

Where does this article fit in?

In I4, we saw that MIDI transmits performance instructions: “play this note, at this force, now.” But MIDI doesn’t know what a measure is, a key signature, or the difference between a C♯ and a D♭. MusicXML fills this gap: it encodes what you see on the score. For a musician reading a score, this information is essential.

With I4 (MIDI) and this article (MusicXML), the Introduction series covers the two dominant musical representation formats. The articles in the Music (M) series will analyze these formats from a formal perspective: M1 (coming soon) for MIDI, M2 for MusicXML.

Why is this important?

MusicXML was created in 2004 to solve this problem: to enable the exchange of complete scores between music notation software — applications specialized in writing and engraving scores, such as Finale, Sibelius, MuseScore, or Dorico. Today, it is the standard exchange format supported by these and dozens of other applications. Understanding its structure allows one to grasp how “traditional” musical information is represented digitally.

The idea in one sentence

MusicXML encodes what you see on a score: notes, rests, measures, key signatures, expression marks — everything that allows the printed page to be reproduced.

How does MusicXML work?

The principle: XML for music

MusicXML is an XML (eXtensible Markup Language) format, meaning it’s a structured text format with tags — keywords enclosed by angle brackets < and >. If you’ve ever seen HTML (the language of web pages), you’ll recognize the syntax:

<note>
  <pitch>
    <step>C</step>
    <octave>4</octave>
  </pitch>
  <duration>4</duration>
  <type>quarter</type>
</note>

How to read this XML code?

<note> opens a “note” element and </note> closes it (the / indicates closure)

<pitch> contains pitch information

<step>C</step> indicates the note name: C = C, D = D, E = E, etc.

<octave>4</octave> indicates the octave (middle C on the piano is in octave 4)

<duration>4</duration> indicates the duration in internal units (see “divisions” below)

<type>quarter</type> indicates the displayed rhythmic value: quarter = quarter note

This note is a C in octave 4, lasting a quarter note. Unlike MIDI, which would simply say “note 60, velocity 80,” MusicXML retains the complete notational information.

The two organizations: partwise vs timewise

MusicXML offers two ways to organize data:

score-partwise (the most common)

The organization follows the structure of an orchestral score: first all measures for one instrument, then all measures for the next.

<score-partwise>
  <part id="P1">           <!-- Violin -->
    <measure number="1">...</measure>
    <measure number="2">...</measure>
  </part>
  <part id="P2">           <!-- Viola -->
    <measure number="1">...</measure>
    <measure number="2">...</measure>
  </part>
</score-partwise>

Advantage: Easy to extract an instrument’s part.

score-timewise

The organization follows the temporal flow: first measure 1 for all instruments, then measure 2, etc.

<score-timewise>
  <measure number="1">
    <part id="P1">...</part>  <!-- Violin, measure 1 -->
    <part id="P2">...</part>  <!-- Viola, measure 1 -->
  </measure>
  <measure number="2">
    <part id="P1">...</part>
    <part id="P2">...</part>
  </measure>
</score-timewise>

Advantage: Easy to analyze what happens at a given moment (vertical chords).

In practice, 95% of files use score-partwise. The two formats are equivalent and automatically convertible using XSLT (eXtensible Stylesheet Language Transformations) stylesheets — a language that allows transforming one XML document into another.

Essential Elements

The Header: Work Metadata

<work>
  <work-title>Sonate pour piano</work-title>
</work>
<identification>
  <creator type="composer">Mozart</creator>
  <encoding>
    <software>MuseScore 4.0</software>
    <encoding-date>2024-01-15</encoding-date>
  </encoding>
</identification>

Parts: Instrument Declaration

<part-list>
  <score-part id="P1">
    <part-name>Piano</part-name>
    <score-instrument id="P1-I1">
      <instrument-name>Piano</instrument-name>
    </score-instrument>
    <midi-instrument id="P1-I1">
      <midi-channel>1</midi-channel>
      <midi-program>1</midi-program>
    </midi-instrument>
  </score-part>
</part-list>

Note that MusicXML can include MIDI information for playback — it bridges the two worlds.

Measure Attributes: Key Signature, Time Signature, Clef

<attributes>
  <divisions>4</divisions>         <!-- subdivisions per quarter note -->
  <key>
    <fifths>-3</fifths>            <!-- 3 flats (♭) = E♭ major -->
    <mode>major</mode>
  </key>
  <time>
    <beats>4</beats>               <!-- numerator -->
    <beat-type>4</beat-type>       <!-- denominator -->
  </time>
  <clef>
    <sign>G</sign>                 <!-- G clef (treble clef) -->
    <line>2</line>                 <!-- on the 2nd line -->
  </clef>
</attributes>

The concept of “divisions” — temporal resolution

<divisions>4</divisions> means that a quarter note is divided into 4 internal units. Thus:

A quarter note = 4 units

An eighth note = 2 units

A sixteenth note = 1 unit

This system avoids fractions and allows for arbitrary precision. The higher the number, the finer the rhythms possible.

Notes: The Heart of the System

<note>
  <pitch>
    <step>F</step>                 <!-- F -->
    <alter>1</alter>               <!-- sharp ♯ (+1) -->
    <octave>4</octave>
  </pitch>
  <duration>2</duration>           <!-- in divisions -->
  <voice>1</voice>
  <type>eighth</type>              <!-- eighth note -->
  <stem>up</stem>                  <!-- stem up -->
  <beam number="1">begin</beam>    <!-- beam start -->
</note>

Key points:

alter distinguishes sharp ♯ (+1), flat ♭ (-1), natural ♮ (0)
duration is the duration in divisions, type is the displayed rhythmic value
Engraving information — i.e., the visual formatting of the score (stem, beam, spacing) — is preserved

Rests

<note>
  <rest/>
  <duration>4</duration>
  <type>quarter</type>
</note>

A rest is modeled as a note without pitch.

Chords

For a chord, the first note is normal, subsequent notes have the <chord/> element:

<note>
  <pitch><step>C</step><octave>4</octave></pitch>
  <duration>4</duration>
  <type>quarter</type>
</note>
<note>
  <chord/>    <!-- This note is part of the previous chord -->
  <pitch><step>E</step><octave>4</octave></pitch>
  <duration>4</duration>
  <type>quarter</type>
</note>
<note>
  <chord/>
  <pitch><step>G</step><octave>4</octave></pitch>
  <duration>4</duration>
  <type>quarter</type>
</note>

Special Notation: <chord/>

The <chord/> tag (with a / at the end, indicating an empty element) signals that this note should be played simultaneously with the previous one, thus forming a chord. Here, C-E-G = C major chord.

Complete Example: A Simple Measure

Here is a measure in C major, 4/4, with four quarter notes (C-D-E-F):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE score-partwise PUBLIC "-//Recordare//DTD MusicXML 4.0 Partwise//EN"
  "http://www.musicxml.org/dtds/partwise.dtd">
<!-- DTD (Document Type Definition): schema that defines the valid structure of the document -->
<score-partwise version="4.0">
  <part-list>
    <score-part id="P1">
      <part-name>Piano</part-name>
    </score-part>
  </part-list>

  <part id="P1">
    <measure number="1">
      <attributes>
        <divisions>1</divisions>
        <key><fifths>0</fifths></key>
        <time><beats>4</beats><beat-type>4</beat-type></time>
        <clef><sign>G</sign><line>2</line></clef>
      </attributes>

      <note>
        <pitch><step>C</step><octave>4</octave></pitch>
        <duration>1</duration><type>quarter</type>
      </note>
      <note>
        <pitch><step>D</step><octave>4</octave></pitch>
        <duration>1</duration><type>quarter</type>
      </note>
      <note>
        <pitch><step>E</step><octave>4</octave></pitch>
        <duration>1</duration><type>quarter</type>
      </note>
      <note>
        <pitch><step>F</step><octave>4</octave></pitch>
        <duration>1</duration><type>quarter</type>
      </note>
    </measure>
  </part>
</score-partwise>

48 lines for 4 notes! This is the main criticism of MusicXML: its verbosity.

Strengths and Limitations

Strengths

Aspect	Advantage
Interoperability	Exchange between Finale, Sibelius, MuseScore, Dorico…
Notational completeness	Key signatures, clefs, articulations, dynamics, lyrics
Readability	Text format, debuggable, versionable with Git (version control system)
Open standard	Public documentation, no proprietary license
Preservation	Encodes the composer’s intention, not just the notes

Limitations

Aspect	Limitation
Verbosity	~10x larger than MIDI for the same music
Microtonality	Supported but complex (offsets in cents — hundredths of a semitone —, non-standard alterations)
Not generative	Describes a fixed score, not generation rules
Variability	Each software interprets certain tags differently
Layout	Engraving details are not always preserved

The problem of microtonality

Microtonality refers to the use of intervals smaller than the semitone of the Western tempered scale. MusicXML can encode quarter tones with fractional alterations:

<pitch>
  <step>C</step>
  <alter>0.5</alter>  <!-- quarter tone above C -->
  <octave>4</octave>
</pitch>

However, few notation software programs display these alterations correctly, and microtonal symbols are not standardized. For world music (Arabic maqam, Indian ragas), MusicXML therefore remains an imperfect compromise.

Musical Expression Elements

MusicXML goes far beyond notes. Here are some essential elements for a complete score:

Dynamics (intensity nuances)

<direction>
  <direction-type>
    <dynamics><ff/></dynamics>    <!-- fortissimo -->
  </direction-type>
</direction>

Classical dynamics are predefined — from softest to loudest: pp (pianissimo, very soft), p (piano, soft), mp (mezzo-piano, moderately soft), mf (mezzo-forte, moderately loud), f (forte, loud), ff (fortissimo, very loud). Free text can also be used.

Articulations (way of playing notes)

Articulations indicate how to attack or connect notes — staccato, accented, slurred, etc.

<note>
  <pitch><step>C</step><octave>4</octave></pitch>
  <notations>
    <articulations>
      <staccato/>                 <!-- staccato: short and detached note -->
      <accent/>                   <!-- accent: note played louder -->
    </articulations>
  </notations>
</note>

Slurs and Legato (connected playing)

Legato (from Italian legare, to tie) is a playing style where notes are connected without interruption. On the score, it is indicated by a slur, an arc connecting several notes:

<note>
  <pitch><step>C</step><octave>4</octave></pitch>
  <notations>
    <slur type="start" number="1"/>
  </notations>
</note>
<!-- intermediate notes -->
<note>
  <pitch><step>E</step><octave>4</octave></pitch>
  <notations>
    <slur type="stop" number="1"/>
  </notations>
</note>

Lyrics (for vocal music)

<note>
  <pitch><step>C</step><octave>4</octave></pitch>
  <lyric number="1">
    <syllabic>begin</syllabic>
    <text>Hel</text>
  </lyric>
</note>
<note>
  <pitch><step>D</step><octave>4</octave></pitch>
  <lyric number="1">
    <syllabic>end</syllabic>
    <text>lo</text>
  </lyric>
</note>

The <syllabic> element indicates whether the syllable is at the beginning, middle, or end of a word, or if it is a complete word.

Concrete Use Cases

Scenario 1: Exchanging a score between software

A composer works on Finale but their publisher uses Sibelius. Solution: export to MusicXML.

What is well preserved: Notes, rhythms, key signatures, clefs, basic dynamics.
What can be problematic: Precise layout, custom fonts, certain rare articulations.

Scenario 2: Analyzing a score with code

A musicologist wants to count intervals in Beethoven’s sonatas. With music21 (a Python library for musical analysis developed at MIT):

from music21 import converter

score = converter.parse('sonate.musicxml')
for note in score.flatten().notes:
    print(note.pitch, note.duration.quarterLength)

MusicXML offers programmatic access to the score that MIDI does not allow as easily.

Scenario 3: Archiving a work for posterity

An orchestra wants to preserve its repertoire in an open digital format. MusicXML is ideal because:

Text format (unlike binary formats — files encoded in raw data, unreadable without dedicated software — and proprietary ones)
Publicly documented standard
Readable in 50 years (unlike a proprietary format that could disappear)

MusicXML vs MIDI: Comparison

Let’s take the same musical phrase and compare:

In MIDI (conceptual)

Note On:  canal=1, note=60, vélocité=80, delta=0
Note Off: canal=1, note=60, vélocité=0,  delta=480
Note On:  canal=1, note=62, vélocité=80, delta=0
Note Off: canal=1, note=62, vélocité=0,  delta=480

(The “delta” is the time elapsed since the previous event, expressed in ticks — see I4 for details.)

MIDI doesn’t know if it’s 4/4, C major, or if the notes are on the staff in treble clef.

In MusicXML

<measure number="1">
  <attributes>
    <key><fifths>0</fifths></key>
    <time><beats>4</beats><beat-type>4</beat-type></time>
    <clef><sign>G</sign><line>2</line></clef>
  </attributes>
  <note>
    <pitch><step>C</step><octave>4</octave></pitch>
    <duration>1</duration><type>quarter</type>
  </note>
  <!-- ... -->
</measure>

MusicXML preserves the complete musical context.

In BP3 (generative grammar — see I2)

gram#1[1] S --> _tempo(120) C4 D4 E4 F4

Just one line. But the true power of BP3 lies elsewhere — you can express variants:

gram#1[1] S --> _tempo(120) Phrase
gram#1[2] <3> Phrase --> C4 D4 E4 F4
gram#1[3] <1> Phrase --> C4 E4 G4 C5

Here, Phrase will be replaced by the first option 3 out of 4 times, and by the second option 1 out of 4 times. The numbers in angle brackets (<3>, <1>) are weights that control the probability of each rule (see B1). MIDI and MusicXML describe a fixed score; BP3 describes a space of possible scores.

Criterion	MIDI	MusicXML	BP3
Key Signature	No	Yes	No (implicit)
Time Signature	No	Yes	No (implicit)
C♯ vs D♭	No	Yes	Yes
Articulations	Partial (velocity)	Yes (staccato, accent…)	Via functions
Lyrics	No	Yes	No
Layout	No	Partial	No (not its role)
File Size	Compact	Verbose	Very compact
Real-time	Yes	No	No
Varied generation	No	No	Yes (stochastic)
Polymetry	No	Limited	Yes (native)

Key Takeaways

MusicXML encodes the score, not the sound or performance instructions.
Readable XML format: hierarchical tags, debuggable, versionable.
Partwise (by instrument) or timewise (by moment) organization — both are equivalent.
Key elements: <note>, <pitch>, <duration>, <measure>, <attributes>.
Interoperability: the de facto standard for exchanging scores between software.
Limitations: verbose, complex microtonality, no algorithmic generation.

To go further

Official Specification: W3C MusicXML
Interactive Tutorial: MusicXML Tutorial
Python Library: music21 for parsing and generating MusicXML
Online Converter: Software like MuseScore allows free export/import

Glossary

Alter: Modification of a note’s pitch in MusicXML (sharp ♯ = +1, flat ♭ = -1, natural ♮ = 0).
Articulation: Indication of how a note is to be played (staccato, accent, tenuto…).
Tag: An XML syntax element enclosed by < and >, such as <note> or </note>.
Divisions: Number of subdivisions per quarter note, defines the temporal resolution of the file.
DOCTYPE: Declaration at the beginning of an XML file that indicates the schema (structure rules) used.
Dynamic: Indication of sound intensity on the score (pp, p, mp, mf, f, ff — from softest to loudest).
Fifths: Number of alterations in the key signature, counted on the circle of fifths (positive = sharps ♯, negative = flats ♭).
HTML: HyperText Markup Language — markup language for web pages, a cousin of XML.
Legato: A connected playing style, where notes follow each other without interruption. Indicated by a slur.
Part: A voice or an instrument in the score.
Partwise: Organization of the file by instrument then by measure (the most common).
Polyphony: Superposition of several independent melodic lines. In MusicXML, managed by the <voice> element.
Staccato: A detached playing style, where each note is played briefly. Indicated by a dot above or below the note.
Timewise: Organization of the file by measure then by instrument.
Voice: A melodic line within a part (for polyphony on a staff).
XML: eXtensible Markup Language — structured text format with nested tags.
XSLT: eXtensible Stylesheet Language Transformations — language for transforming XML into another format.