L1) Chomsky Hierarchy Explained Simply

Can all programming languages, human languages, and even musical notations be classified according to their complexity? Noam Chomsky proposed an answer in 1956.

Where does this article fit in?

This article opens the Formal Languages (L) series. It lays the theoretical foundations for the entire series by classifying languages by complexity according to Chomsky’s four classic levels. Articles L2 to L7 and the B series (BP3) build upon this classification. We will see in L9 that this four-level hierarchy is actually a simplification — between Type 2 and Type 1 lies a fascinating intermediate zone, which we call Type 2+ (mildly context-sensitive), crucial for music and natural language.

Why is this important?

Imagine you had to write a program capable of understanding text. Not just any text: sometimes a simple keyword search, sometimes Python code, sometimes spoken French. Intuitively, we feel that these three tasks do not have the same difficulty. But how do we quantify this difference?

Chomsky’s hierarchy provides a mathematical answer to this question. It classifies languages (in the broad sense: natural languages, programming languages, notations) into four levels according to the power required to recognize or generate them. This classification has direct practical consequences: it determines what type of tool to use to analyze a given language.

For the BP2SC (Bol Processor to SuperCollider) project — a transpiler (source-to-source code translator, see B7) for musical notation — this hierarchy helps us understand why certain BP3 (Bol Processor 3, see I2) constructions require more sophisticated analysis techniques than others.

The idea in one sentence

Chomsky’s hierarchy classifies languages into four levels (Type 0 to Type 3) according to the complexity of the rules needed to describe them.

The four levels, from simplest to most complex

Type 3: Regular Languages — “The Memory of a Goldfish”

What they can do: Recognize simple, repetitive patterns, without nesting.

The finite automaton: the machine that recognizes these languages

A finite automaton is a very simple abstract machine: it has a fixed number of states (like squares on a game board) and transitions between these states (like arrows indicating where to go depending on what is read). It reads an input character by character, changes state with each read, and at the end says “accepted” or “rejected” depending on the state it is in.

Analogy: the subway turnstile. A turnstile has two states: locked and unlocked. If you insert a ticket, it goes from locked to unlocked. If you push, it goes from unlocked to locked (and lets you pass). The turnstile doesn’t know how many people have passed before you — it only knows its current state. This is exactly how a finite automaton works: it reacts to each input by changing state, with no memory of the past.

Regular expressions: the practical tool

A regular expression (or regex) is a compact notation for describing a text pattern. It defines exactly what a finite automaton can recognize, but in a human-readable form.

Concrete example:

Validate a French phone number: 06 12 34 56 78
Regex: 0[67] (\d{2} ){4}

Here, 0[67] means “0 followed by 6 or 7”, \d{2} means “two digits”, and {4} means “repeated 4 times”.

What regular languages can describe

Variable identifiers ([a-zA-Z_][a-zA-Z0-9_]*)
Simple date formats (\d{4}-\d{2}-\d{2})
Language keywords
Email addresses (simplified form)

Fundamental limitation: no memory = no counting

A regular language cannot “count”. It cannot verify that there are as many opening parentheses as closing ones, because that would require remembering the number of ( already seen.

What cannot be described:

aⁿbⁿ (n times ‘a’ followed by n times ‘b’) — would require counting the ‘a’s
Correctly matched parentheses — ((())) is valid, (() is not
Nested HTML tags — <div><p></p></div> requires knowing which tag to close

Type 2: Context-free Languages — “The Magic Stack”

What they can do: Handle nested structures, recursion, parentheses.

The pushdown automaton: adding memory

A pushdown automaton is a finite automaton augmented with a stack — a “last-in, first-out” (LIFO) memory. Unlike the finite automaton which only retains a state, the pushdown automaton can push symbols to “remember” what it has seen, then pop them when necessary.

Stack vs Queue: Two Ways to Manage Memory

Structure Principle Analogy Acronym

Stack Last In, First Out A stack of plates: you always take the one on top LIFO (Last In, First Out)

Queue First In, First Out A waiting line: the first to arrive is served first FIFO (First In, First Out)

>
The pushdown automaton uses a stack because that’s what’s needed to manage nesting: when an opening parenthesis ( is encountered, it’s pushed onto the stack; when a closing ) is encountered, the last opened parenthesis is popped — not the first.

Structure	Principle	Analogy	Acronym
Stack	Last In, First Out	A stack of plates: you always take the one on top	LIFO (Last In, First Out)
Queue	First In, First Out	A waiting line: the first to arrive is served first	FIFO (First In, First Out)

Concrete example: To check that parentheses are balanced in ((a + b) * c):

Read ( → push (
Read ( → push ( (stack: (()
Read ) → pop (stack: ()
Read ) → pop (empty stack)
End of reading, empty stack → accepted

Context-free grammars (CFG): describing structure

A context-free grammar (CFG) is a set of rewrite rules where each rule has the form A → γ: a single symbol A (called a non-terminal) can be replaced by a sequence γ, independently of what surrounds it. This is what makes it “context-free”.

Example — Arithmetic expressions:

Grammar:
  E → E + T | T
  T → T * F | F
  F → ( E ) | number

Allows parsing: 2 + 3 * (4 + 5)

The | symbol means “or“: the rule E → E + T | T reads “E can become either E + T, or simply T“. It’s a compact notation for writing two rules as one.

The rule E → E + T simply says “an E can always be replaced by E + T”, regardless of what comes before or after it.

Parsers: transforming text into structure

A parser is a program that analyzes text according to a grammar and builds a structure (often a tree). There are two main families:

Top-down parsers: start from the start symbol and try to derive the text. Example: recursive descent parsers, where each grammar rule becomes a function that calls itself recursively.

Bottom-up parsers: start from the text and try to ascend to the start symbol by reducing groups of symbols. Example: LR parsers (Left-to-right, Rightmost derivation).

What context-free languages can describe

Most programming languages (syntax)
HTML (tag structure)
JSON
Mathematical expressions
The basic structure of sentences (subject-verb-complement)

What cannot be described

Verifying that a variable is declared before being used — this requires “remembering” all previous declarations
Complex grammatical agreements — in French, “les chats mangent” (the cats eat) but “le chat mange” (the cat eats): the agreement depends on a distant element
Cross-references — when an element in the text refers to another element defined elsewhere (like a footnote referring to a bibliographic reference, or a hyperlink to an anchor)

Type 1: Context-sensitive Languages — “Context Matters”

What they can do: Rewrite rules can depend on the context.

Formalism:
A rule like αAβ → αγβ means “A can become γ, but only if A is surrounded by α on the left and β on the right”.

Classic example:

L = { a^n b^n c^n | n ≥ 1 }
= { abc, aabbcc, aaabbbccc, ... }

This simple language (“as many a’s, then as many b’s, then as many c’s”) is impossible to describe with a context-free grammar. It requires being able to “count” on two simultaneous counters.

Analogy:
If context-free languages are like sentences where each word can be replaced independently, context-sensitive languages are like sentences where the replacement depends on neighboring words — like gender and number agreement in French.

Applications:

Semantic analysis of programming languages
Type checking
Certain structures of natural languages

Type 0: Recursively Enumerable Languages — “Anything is possible, but…”

What they can do: Describe any computable process.

The Turing machine: the universal model of computation

A Turing machine is a theoretical model invented by Alan Turing in 1936. It consists of:

An infinite tape divided into cells, each containing a symbol
A read/write head that can read, write, and move one cell left or right
A finite set of states and transition rules

Unlike the pushdown automaton (memory limited to a stack), the Turing machine has access to unlimited memory and can access it in any order. It is the most powerful model: anything that is computable can be computed by a Turing machine.

Analogy: Imagine an office worker with an infinite roll of paper, a pencil, and an eraser. They can write, erase, move forward, move backward, and follow instructions. This is essentially what a computer does.

The problem: non-termination

For these languages, there isn’t always an algorithm that halts to say “yes” or “no”. An analyzer can run indefinitely on certain inputs — and it’s impossible to know in advance whether it will halt or not (this is the famous “halting problem”, proven undecidable by Turing).

In practice

This level is almost never used directly in computer science. It is the domain of computability theory and undecidable problems. But understanding this limit is important: it tells us that there are problems that no program can solve in all cases.

Summary table

Type	Name	Automaton	Practical Tool	Memory	Can Describe	Cannot Describe
3	Regular	Finite automaton	Regex	None (just current state)	Simple patterns, tokens, formats	Nested parentheses, `aⁿbⁿ`
2	Context-free	Pushdown automaton	Parsers (ANTLR, Lark, Yacc)	A stack (LIFO)	JSON, HTML, language syntax	Cross-references, distant agreements
1	Context-sensitive	Linearly bounded automaton	Semantic analyzers	Memory proportional to input	`aⁿbⁿcⁿ`, type checking	Undecidable problems
0	Recursively enumerable	Turing machine	Any program	Unlimited memory	Anything computable	Halting problem

Form of production rules

Type	Rule Form	Meaning
3	`A → aB` or `A → a`	A non-terminal yields a terminal + optionally a non-terminal
2	`A → γ`	A non-terminal yields any sequence
1	`αAβ → αγβ` with `\|γ\| ≥ 1`	Context-dependent rewrite, never contracting
0	`α → β`	Any rewrite

Strict inclusion

Type 3 ⊂ Type 2 ⊂ Type 1 ⊂ Type 0

Every regular language is context-free.
Every context-free language is context-sensitive.
But the converse is false.

In other words: a tool that recognizes Type 2 languages also recognizes all Type 3 languages, but the reverse is not true.

Note: This four-level presentation is a simplification. Between Type 2 (context-free) and Type 1 (context-sensitive), there actually exists a rich and structured intermediate zone — Type 2+ (mildly context-sensitive) languages — where the most useful formalisms for linguistics and music, including BP3, reside. This is the subject of L9.

And BP3 in all this?

The BP3 (Bol Processor) language is particularly interesting because it is located in the Type 2+ zone — between pure Type 2 and Type 1.

Its syntax (the structure of grammar files) is context-free (Type 2): it can be described with a standard EBNF (Extended Backus-Naur Form, see L3) grammar, and a classic parser is sufficient.

Its semantics (what the rules mean during execution) is Type 2+ (mildly context-sensitive, see L9). Why?

Context-dependent rules: BP3 allows writing rules like:

    |o| |miny| --> |o1| |miny|

Here, |o| is rewritten only if |miny| is present next to it. This is exactly the definition of a context-sensitive rule.

Flags: Variables like /Ideas/ modify behavior according to the global state:

    /Ideas/ S --> A B /Ideas-1/

The rule applies only if Ideas > 0, then decrements the counter.

Dynamic weights: Weights like <50-12> change with each use, creating a temporal dependency.

However, BP3 remains at the Type 2+ level (and not fully Type 1) because:

The context is always local (adjacent symbols)
There are no contracting rules (the output is at least as long as the input)
In practice, programs always terminate

This Type 2+ position explains why the BP2SC transpiler uses a context-free parser for syntax, but must generate SuperCollider code with routines (Prout, a Pattern that evaluates a function on each call) to capture mildly context-sensitive behaviors at runtime.

Key takeaways

Type 3 (regular): No memory. Perfect for simple patterns. → Regex.
Type 2 (context-free): A stack to manage nesting. Sufficient for the syntax of most languages. → Classic parsers.
Type 2+ (mildly context-sensitive): Between Type 2 and Type 1 — local context, without combinatorial explosion. This is where BP3 is located (see L9).
Type 1 (context-sensitive): Context influences rules. Necessary for semantics and natural languages.
Type 0 (recursively enumerable): Maximum power, but not always decidable. Rarely used in practice.
In practice: Most programming languages have Type 2 syntax but semantics that require additional analysis — often at the Type 2+ level.

To go further

Foundational book: Hopcroft, Motwani & Ullman, Introduction to Automata Theory, Languages, and Computation
Original article: Chomsky, N. (1956). “Three models for the description of language”
Interactive visualization: The Chomsky Hierarchy

Glossary

Finite automaton: An abstract machine with a finite number of states, without external memory. Reads an input character by character and changes state according to transition rules.
Pushdown automaton: A finite automaton augmented with a stack (LIFO memory). Can push and pop symbols, allowing it to manage nesting.
Context-free: Refers to a grammar whose rules can be applied independently of the context (what surrounds the symbol to be rewritten).
Regular expression (regex): A compact notation for describing text patterns. Equivalent in power to a finite automaton.
Grammar: A set of rules for generating or recognizing a language. Defined by terminal symbols, non-terminal symbols, and production rules.
Context-free grammar (CFG): A grammar where all rules have the form A → γ (a single non-terminal on the left).
Formal language: A set of symbol strings defined by precise rules.
Turing machine: A theoretical model of computation with an infinite tape, a read/write head, and transition rules. Can compute anything that is computable.
Non-terminal: An intermediate symbol in a grammar, intended to be replaced by other symbols.
Parser: A program that analyzes text according to a grammar and builds a structure (syntax tree).
Bottom-up parser: A parser that starts from the text and ascends to the start symbol by reducing groups of symbols.
Top-down parser: A parser that starts from the start symbol and tries to derive the text by applying the rules.
Cross-references: Links between elements of a text where one element refers to another defined elsewhere (e.g., hyperlinks, footnotes).
Production rule: A rule of the form A → γ indicating how to rewrite a symbol.
Terminal: A final symbol in a grammar, which appears in the result (as opposed to non-terminals).

Prerequisites: None
Reading time: 10 min
Tags: #chomsky #formal-languages #automata #grammars

Next article: L2 — Context-free grammars: the foundation of everything