L4) What is an AST?
Why don’t compilers work directly on your source code? Because a tree is better than a string of characters.
Where does this article fit in?
This article is part of the Formal Languages (L) series. It introduces the Abstract Syntax Tree, produced by the analysis of context-free grammars (L2) described in EBNF (L3). The AST is the central representation of the BP2SC pipeline (processing chain).
Why is this important?
You’ve written code. The compiler needs to understand it, verify it, perhaps optimize it, and then translate it. But your code is just a sequence of characters — letters, numbers, symbols. How can it be manipulated intelligently?
The AST (Abstract Syntax Tree) is the answer. It’s a structured representation of your code that captures its meaning without being cluttered by the details of its form. Parentheses, spaces, semicolons — all of that disappears. Only the logical structure remains.
For the BP2SC (BP3-to-SuperCollider, see I2 and I3) transpiler (source-to-source code translator), the AST is the core of the system. Each BP3 file (the musical grammar language of the Bol Processor, see I2) is first parsed into a tree of 12 different node types, then this tree is traversed to generate the equivalent SuperCollider (sound synthesis environment, see I3) code.
The idea in one sentence
An AST is a tree-like representation of source code that preserves its logical structure while eliminating non-essential syntactic details.
Let’s explain step by step
Parse tree vs AST: the crucial difference
When a parser analyzes code, it first produces a parse tree (also called a concrete syntax tree) that exactly reflects the grammar rules. This tree contains everything, including purely syntactic elements.
Analogy: the script vs the screenplay
Imagine a play:
- The complete script includes all stage directions, page numbers, technical indications (“lights at 50%”). This is the parse tree.
- The cleaned-up screenplay only keeps the dialogues and essential actions. This is the AST.
Both represent the same play, but one is faithful to the form, the other to the substance.
Example with 2 + 3:
Parse tree (derivation tree):
Expr
/ | \
Expr "+" Term
| |
Term Factor
| |
Factor "3"
|
"2"
This tree contains Expr, Term, Factor nodes that correspond to the grammar rules. It’s faithful, but verbose.
AST (abstract tree):
Add
/ \
2 3
The AST only keeps the essential: an addition operation with two operands. The intermediate levels have disappeared.
Why this simplification?
- Simpler manipulation: fewer nodes to traverse
- Grammar independence: syntax can be changed without modifying the AST
- Focus on semantics: the tree represents what the code does, not how it is written
Example 1: the expression 2 + 3 * 4
Let’s build the AST of this expression step by step.
The source code: 2 + 3 * 4
The resulting AST:
Add
/ \
2 Mul
/ \
3 4
Important observations:
- Precedence is encoded:
Mulis a child ofAdd, so multiplication will be evaluated first - No parentheses: they are not in the tree, but their effect is
- Numbers are leaves: they are terminal values
AST evaluation (post-order traversal):
What is a post-order traversal?
A traversal is a systematic way of visiting all nodes in a tree. Post-order means: “children first, then parent”.
Why post-order for evaluation? Because we must know the result of sub-expressions before we can calculate the parent expression. We cannot calculate
2 + (3*4)without first calculating3*4.
1. Visit the Add node
2. Visit left child → 2
3. Visit right child (Mul)
4. Visit left child → 3
5. Visit right child → 4
6. Evaluate Mul(3, 4) → 12 <-- children evaluated first
7. Evaluate Add(2, 12) → 14 <-- parent evaluated after
Structure of an AST node
In practice, an AST node is often a class or a data structure:
from dataclasses import dataclass
from typing import Union
@dataclass
class Number:
value: int
@dataclass
class BinOp:
op: str # "+", "-", "*", "/"
left: 'Expr'
right: 'Expr'
Expr = Union[Number, BinOp]
The AST of 2 + 3 * 4 in Python:
ast = BinOp(
op="+",
left=Number(2),
right=BinOp(
op="*",
left=Number(3),
right=Number(4)
)
)
Example 2: The BP3 AST
The BP2SC transpiler defines 12 node types to represent all elements of a BP3 file:
| Node | Represents | BP3 Example |
|---|---|---|
BPFile |
Complete file | (root) |
GrammarBlock |
Grammar section | ORD[1] |
Rule |
Production rule | S --> A B |
Weight |
Rule weight (see B4) | <50>, <50-12> |
Flag |
Condition or operation (see B4) | /Ideas/, /Ideas-1/ |
Note |
Musical note | do4, C#5 |
Rest |
Rest | - |
NonTerminal |
Symbol to expand | S, Phrase |
Variable |
Variable between pipes | \|x\| |
Polymetric |
Polymetric expression (see B5) | {2, A B C} |
SpecialFn |
Special function | _transpose(3) |
HomoApply |
Homomorphism (see B6) | (= \|x\|) |
Concrete example — the BP3 rule:
<50> /Ideas/ S --> A _transpose(2) do4 {2, B C}
Its AST:
Rule
├── weight: Weight(value=50, decrement=None)
├── flags: [Flag(name="Ideas", op="condition")]
├── lhs: [NonTerminal("S")]
└── rhs:
├── NonTerminal("A")
├── SpecialFn(name="transpose", args=[2])
├── Note(name="do", octave=4, accidental=None)
└── Polymetric(ratio=2, elements=[
NonTerminal("B"),
NonTerminal("C")
])
Each element of the source code corresponds to a typed node in the tree.
AST Traversal
To exploit an AST, we traverse it: we systematically visit each node to perform an operation. Several strategies exist, each adapted to a specific use:
The three depth-first traversal orders
Imagine visiting a multi-story house:
- Pre-order: note each room upon entering, before exploring its sub-rooms
- Post-order: note each room upon exiting, after exploring its sub-rooms
- In-order: for binary trees, alternate left-center-right
Depth-first traversal:
| Strategy | Visit Order | Use Case |
|---|---|---|
| Pre-order | Node, then children | Copy tree, serialize |
| Post-order | Children, then node | Evaluate, calculate properties |
| In-order | Left, node, right | Display infix expressions |
Breadth-first traversal:
- Visit level by level, from top to bottom
- Useful for analyses that depend on “depth” in the tree
The Visitor pattern:
What is a design pattern?
A design pattern is a reusable solution to a recurring problem in programming. These are proven “recipes”.
The Visitor pattern solves this problem: “How to add new operations to an AST without modifying the node classes each time?”
Analogy: Imagine a museum (the AST) with artworks (the nodes). Rather than modifying each artwork to add a new feature (audio guide, braille description…), a specialized “visitor” is created that knows how to interact with each type of artwork.
In object-oriented programming, the Visitor pattern allows adding operations without modifying the node classes:
class ASTVisitor:
def visit(self, node):
method_name = f'visit_{type(node).__name__}'
visitor = getattr(self, method_name, self.generic_visit)
return visitor(node)
def generic_visit(self, node):
raise NotImplementedError(f"No visitor for {type(node)}")
class Evaluator(ASTVisitor):
def visit_Number(self, node):
return node.value
def visit_BinOp(self, node):
left = self.visit(node.left)
right = self.visit(node.right)
if node.op == '+': return left + right
if node.op == '*': return left * right
# ... other operations
# Usage
evaluator = Evaluator()
result = evaluator.visit(ast) # → 14
From AST to target code
The BP2SC transpiler (see B7) uses the AST to generate SuperCollider code. Each node type has a translation rule:
| BP3 Node | SuperCollider Code |
|---|---|
Rule(lhs=S, rhs=[A,B]) |
Pdef(\S, Pseq([Pdef(\A), Pdef(\B)])) |
Note(do, 4) |
Pbind(\midinote, 60, \dur, 0.25) |
Rest |
Event.silent(0.25) |
Polymetric(2, [A,B,C]) |
Pbindf(Pseq([...]), \stretch, 2/3) |
SpecialFn(transpose, 3) |
Pbindf(..., \mtranspose, 3) |
Translation example:
// BP3
S --> do4 re4 mi4
// AST
Rule(lhs=[NonTerminal("S")], rhs=[Note("do",4), Note("re",4), Note("mi",4)])
// SuperCollider
Pdef(\S, Pseq([
Pbind(\midinote, 60, \dur, 0.25),
Pbind(\midinote, 62, \dur, 0.25),
Pbind(\midinote, 64, \dur, 0.25)
], 1))
Why the AST is central
The AST is the pivot of any compiler or transpiler:
Source code → [Lexer] → Tokens → [Parser] → AST → [Emitter] → Target code
^
|
Central point
The steps of the compilation pipeline
Step Role Input Output Lexer Break text into units "2 + 3"[NUM:2, PLUS, NUM:3]Parser Build the structure List of tokens AST Emitter Generate target code AST Executable code >
Lexer (or tokenizer): transforms a string of characters into tokens, the significant units of the language. Example:123becomes a NUMBER token,ifbecomes a KEYWORD token.Tokens: lexical units of the language (numbers, operators, keywords, identifiers). These are the “words” recognized by the lexer before syntactic analysis.
Advantages of this architecture:
- Separation of concerns: the parser handles syntax, the emitter handles generation
- Intermediate analyses: the AST can be checked, optimized, transformed before emission
- Multi-target: the same AST can generate code for different platforms
- Testability: the parser and emitter can be tested independently
Key takeaways
- An AST is a tree-like representation of source code, stripped of non-essential syntactic details.
- Parse tree vs AST: the parse tree is faithful to the grammar, the AST is faithful to the meaning.
- Nodes are typed: each language construct has its node type (
Note,Rule,BinOp…). - Traversal enables exploitation: evaluation, code generation, static analysis.
- The AST is the central point of any compiler: it decouples syntactic analysis from code generation.
- BP2SC uses 12 node types to represent the entirety of BP3 notation.
To go further
- Python
astmodule: https://docs.python.org/3/library/ast.html (to explore Python ASTs) - AST Explorer: https://astexplorer.net/ (interactive visualization of ASTs for many languages)
- Dragon Book, Chapter 5: Syntax-Directed Translation
- Visitor Pattern: https://refactoring.guru/design-patterns/visitor
Glossary
| Term | Definition |
|---|---|
| AST | Abstract Syntax Tree. A tree-like representation of source code that preserves the logical structure by eliminating non-essential syntactic details (parentheses, decorative commas, etc.). |
| Design pattern | A reusable solution to a recurring problem in software design. The Visitor pattern is a classic example for ASTs. |
| Emitter | A component of a compiler that generates target code (or object code) from the AST. Also called “code generator”. |
| Leaf | A terminal node in the tree, with no children. Represents atomic values: numbers, identifiers, strings. |
| Lexer | The first stage of a compiler. Breaks down source code into tokens (lexical units). Also called “tokenizer” or “lexical analyzer”. |
| Node | An element of the tree representing a language construct. Each node has a type (BinOp, Note, Rule…) and zero or more children. |
| Parse tree | A derivation tree faithful to the grammar rules. More detailed than the AST, it includes all syntactic elements. Also called “concrete syntax tree”. |
| Traversal | Systematic visit of all nodes in a tree according to a defined strategy (pre-order, post-order, in-order, or breadth-first). |
| Post-order | A traversal strategy where all children of a node are visited first, then the node itself. Ideal for expression evaluation. |
| Pre-order | A traversal strategy where the node is visited first, then its children. Ideal for copying or serializing a tree. |
| Token | A lexical unit of the language, produced by the lexer. Examples: number, operator, keyword, identifier. A token has a type and a value. |
| Visitor | A design pattern that allows adding operations to the AST without modifying the node classes. Each type of operation is a distinct “visitor”. |
Prerequisites: L2 — Context-Free Grammars, L3 — EBNF
Reading time: 9 min
Tags: #ast #compilation #parsing #trees #transpilation
Next article: L5 — The three semantics: giving meaning to formal languages