L4) What is an AST?

Why don’t compilers work directly on your source code? Because a tree is better than a string of characters.

Where does this article fit in?

This article is part of the Formal Languages (L) series. It introduces the Abstract Syntax Tree, produced by the analysis of context-free grammars (L2) described in EBNF (L3). The AST is the central representation of the BP2SC pipeline (processing chain).

Why is this important?

You’ve written code. The compiler needs to understand it, verify it, perhaps optimize it, and then translate it. But your code is just a sequence of characters — letters, numbers, symbols. How can it be manipulated intelligently?

The AST (Abstract Syntax Tree) is the answer. It’s a structured representation of your code that captures its meaning without being cluttered by the details of its form. Parentheses, spaces, semicolons — all of that disappears. Only the logical structure remains.

For the BP2SC (BP3-to-SuperCollider, see I2 and I3) transpiler (source-to-source code translator), the AST is the core of the system. Each BP3 file (the musical grammar language of the Bol Processor, see I2) is first parsed into a tree of 12 different node types, then this tree is traversed to generate the equivalent SuperCollider (sound synthesis environment, see I3) code.

The idea in one sentence

An AST is a tree-like representation of source code that preserves its logical structure while eliminating non-essential syntactic details.

Let’s explain step by step

Parse tree vs AST: the crucial difference

When a parser analyzes code, it first produces a parse tree (also called a concrete syntax tree) that exactly reflects the grammar rules. This tree contains everything, including purely syntactic elements.

Analogy: the script vs the screenplay

Imagine a play:

The complete script includes all stage directions, page numbers, technical indications (“lights at 50%”). This is the parse tree.

The cleaned-up screenplay only keeps the dialogues and essential actions. This is the AST.

Both represent the same play, but one is faithful to the form, the other to the substance.

Example with 2 + 3:

Parse tree (derivation tree):

        Expr
       / | \
    Expr "+" Term
      |        |
   Term     Factor
      |        |
  Factor      "3"
      |
     "2"

This tree contains Expr, Term, Factor nodes that correspond to the grammar rules. It’s faithful, but verbose.

AST (abstract tree):

    Add
   /   \
  2     3

The AST only keeps the essential: an addition operation with two operands. The intermediate levels have disappeared.

Why this simplification?

Simpler manipulation: fewer nodes to traverse
Grammar independence: syntax can be changed without modifying the AST
Focus on semantics: the tree represents what the code does, not how it is written

Example 1: the expression `2 + 3 * 4`

Let’s build the AST of this expression step by step.

The source code: 2 + 3 * 4

The resulting AST:

Important observations:

Precedence is encoded: Mul is a child of Add, so multiplication will be evaluated first
No parentheses: they are not in the tree, but their effect is
Numbers are leaves: they are terminal values

AST evaluation (post-order traversal):

What is a post-order traversal?

A traversal is a systematic way of visiting all nodes in a tree. Post-order means: “children first, then parent”.

Why post-order for evaluation? Because we must know the result of sub-expressions before we can calculate the parent expression. We cannot calculate 2 + (3*4) without first calculating 3*4.

1. Visit the Add node
2.   Visit left child → 2
3.   Visit right child (Mul)
4.     Visit left child → 3
5.     Visit right child → 4
6.     Evaluate Mul(3, 4) → 12    <-- children evaluated first
7.   Evaluate Add(2, 12) → 14     <-- parent evaluated after

Structure of an AST node

In practice, an AST node is often a class or a data structure:

from dataclasses import dataclass
from typing import Union

@dataclass
class Number:
    value: int

@dataclass
class BinOp:
    op: str          # "+", "-", "*", "/"
    left: 'Expr'
    right: 'Expr'

Expr = Union[Number, BinOp]

The AST of 2 + 3 * 4 in Python:

ast = BinOp(
    op="+",
    left=Number(2),
    right=BinOp(
        op="*",
        left=Number(3),
        right=Number(4)
    )
)

Example 2: The BP3 AST

The BP2SC transpiler defines 12 node types to represent all elements of a BP3 file:

Node	Represents	BP3 Example
`BPFile`	Complete file	(root)
`GrammarBlock`	Grammar section	`ORD[1]`
`Rule`	Production rule	`S --> A B`
`Weight`	Rule weight (see B4)	`<50>`, `<50-12>`
`Flag`	Condition or operation (see B4)	`/Ideas/`, `/Ideas-1/`
`Note`	Musical note	`do4`, `C#5`
`Rest`	Rest	`-`
`NonTerminal`	Symbol to expand	`S`, `Phrase`
`Variable`	Variable between pipes	`\\|x\\|`
`Polymetric`	Polymetric expression (see B5)	`{2, A B C}`
`SpecialFn`	Special function	`_transpose(3)`
`HomoApply`	Homomorphism (see B6)	`(= \\|x\\|)`

Concrete example — the BP3 rule:

<50> /Ideas/ S --> A _transpose(2) do4 {2, B C}

Its AST:

Rule
├── weight: Weight(value=50, decrement=None)
├── flags: [Flag(name="Ideas", op="condition")]
├── lhs: [NonTerminal("S")]
└── rhs:
    ├── NonTerminal("A")
    ├── SpecialFn(name="transpose", args=[2])
    ├── Note(name="do", octave=4, accidental=None)
    └── Polymetric(ratio=2, elements=[
        NonTerminal("B"),
        NonTerminal("C")
    ])

Each element of the source code corresponds to a typed node in the tree.

AST Traversal

To exploit an AST, we traverse it: we systematically visit each node to perform an operation. Several strategies exist, each adapted to a specific use:

The three depth-first traversal orders

Imagine visiting a multi-story house:

Pre-order: note each room upon entering, before exploring its sub-rooms

Post-order: note each room upon exiting, after exploring its sub-rooms

In-order: for binary trees, alternate left-center-right

Depth-first traversal:

Strategy	Visit Order	Use Case
Pre-order	Node, then children	Copy tree, serialize
Post-order	Children, then node	Evaluate, calculate properties
In-order	Left, node, right	Display infix expressions

Breadth-first traversal:

Visit level by level, from top to bottom
Useful for analyses that depend on “depth” in the tree

The Visitor pattern:

What is a design pattern?

A design pattern is a reusable solution to a recurring problem in programming. These are proven “recipes”.

The Visitor pattern solves this problem: “How to add new operations to an AST without modifying the node classes each time?”

Analogy: Imagine a museum (the AST) with artworks (the nodes). Rather than modifying each artwork to add a new feature (audio guide, braille description…), a specialized “visitor” is created that knows how to interact with each type of artwork.

In object-oriented programming, the Visitor pattern allows adding operations without modifying the node classes:

class ASTVisitor:
    def visit(self, node):
        method_name = f'visit_{type(node).__name__}'
        visitor = getattr(self, method_name, self.generic_visit)
        return visitor(node)

    def generic_visit(self, node):
        raise NotImplementedError(f"No visitor for {type(node)}")

class Evaluator(ASTVisitor):
    def visit_Number(self, node):
        return node.value

    def visit_BinOp(self, node):
        left = self.visit(node.left)
        right = self.visit(node.right)
        if node.op == '+': return left + right
        if node.op == '*': return left * right
        # ... other operations

# Usage
evaluator = Evaluator()
result = evaluator.visit(ast)  # → 14

From AST to target code

The BP2SC transpiler (see B7) uses the AST to generate SuperCollider code. Each node type has a translation rule:

BP3 Node	SuperCollider Code
`Rule(lhs=S, rhs=[A,B])`	`Pdef(\S, Pseq([Pdef(\A), Pdef(\B)]))`
`Note(do, 4)`	`Pbind(\midinote, 60, \dur, 0.25)`
`Rest`	`Event.silent(0.25)`
`Polymetric(2, [A,B,C])`	`Pbindf(Pseq([...]), \stretch, 2/3)`
`SpecialFn(transpose, 3)`	`Pbindf(..., \mtranspose, 3)`

Translation example:

// BP3
S --> do4 re4 mi4

// AST
Rule(lhs=[NonTerminal("S")], rhs=[Note("do",4), Note("re",4), Note("mi",4)])

// SuperCollider
Pdef(\S, Pseq([
    Pbind(\midinote, 60, \dur, 0.25),
    Pbind(\midinote, 62, \dur, 0.25),
    Pbind(\midinote, 64, \dur, 0.25)
], 1))

Why the AST is central

The AST is the pivot of any compiler or transpiler:

Source code → [Lexer] → Tokens → [Parser] → AST → [Emitter] → Target code
                                                 ^
                                                 |
                                          Central point

The steps of the compilation pipeline

Step Role Input Output

Lexer Break text into units "2 + 3" [NUM:2, PLUS, NUM:3]

Parser Build the structure List of tokens AST

Emitter Generate target code AST Executable code

>
Lexer (or tokenizer): transforms a string of characters into tokens, the significant units of the language. Example: 123 becomes a NUMBER token, if becomes a KEYWORD token.

Tokens: lexical units of the language (numbers, operators, keywords, identifiers). These are the “words” recognized by the lexer before syntactic analysis.

Step	Role	Input	Output
Lexer	Break text into units	`"2 + 3"`	`[NUM:2, PLUS, NUM:3]`
Parser	Build the structure	List of tokens	AST
Emitter	Generate target code	AST	Executable code

Advantages of this architecture:

Separation of concerns: the parser handles syntax, the emitter handles generation
Intermediate analyses: the AST can be checked, optimized, transformed before emission
Multi-target: the same AST can generate code for different platforms
Testability: the parser and emitter can be tested independently

Key takeaways

An AST is a tree-like representation of source code, stripped of non-essential syntactic details.
Parse tree vs AST: the parse tree is faithful to the grammar, the AST is faithful to the meaning.
Nodes are typed: each language construct has its node type (Note, Rule, BinOp…).
Traversal enables exploitation: evaluation, code generation, static analysis.
The AST is the central point of any compiler: it decouples syntactic analysis from code generation.
BP2SC uses 12 node types to represent the entirety of BP3 notation.

To go further

Python ast module: https://docs.python.org/3/library/ast.html (to explore Python ASTs)
AST Explorer: https://astexplorer.net/ (interactive visualization of ASTs for many languages)
Dragon Book, Chapter 5: Syntax-Directed Translation
Visitor Pattern: https://refactoring.guru/design-patterns/visitor

Glossary

Term	Definition
AST	Abstract Syntax Tree. A tree-like representation of source code that preserves the logical structure by eliminating non-essential syntactic details (parentheses, decorative commas, etc.).
Design pattern	A reusable solution to a recurring problem in software design. The Visitor pattern is a classic example for ASTs.
Emitter	A component of a compiler that generates target code (or object code) from the AST. Also called “code generator”.
Leaf	A terminal node in the tree, with no children. Represents atomic values: numbers, identifiers, strings.
Lexer	The first stage of a compiler. Breaks down source code into tokens (lexical units). Also called “tokenizer” or “lexical analyzer”.
Node	An element of the tree representing a language construct. Each node has a type (BinOp, Note, Rule…) and zero or more children.
Parse tree	A derivation tree faithful to the grammar rules. More detailed than the AST, it includes all syntactic elements. Also called “concrete syntax tree”.
Traversal	Systematic visit of all nodes in a tree according to a defined strategy (pre-order, post-order, in-order, or breadth-first).
Post-order	A traversal strategy where all children of a node are visited first, then the node itself. Ideal for expression evaluation.
Pre-order	A traversal strategy where the node is visited first, then its children. Ideal for copying or serializing a tree.
Token	A lexical unit of the language, produced by the lexer. Examples: number, operator, keyword, identifier. A token has a type and a value.
Visitor	A design pattern that allows adding operations to the AST without modifying the node classes. Each type of operation is a distinct “visitor”.

Prerequisites: L2 — Context-Free Grammars, L3 — EBNF
Reading time: 9 min
Tags: #ast #compilation #parsing #trees #transpilation

Next article: L5 — The three semantics: giving meaning to formal languages