L4) What is an AST?

Why don’t compilers work directly on your source code? Because a tree is better than a string of characters.

Where does this article fit in?

This article is part of the Formal Languages (L) series. It introduces the Abstract Syntax Tree, produced by the analysis of context-free grammars (L2) described in EBNF (L3). The AST is the central representation of the BP2SC pipeline (processing chain).


Why is this important?

You’ve written code. The compiler needs to understand it, verify it, perhaps optimize it, and then translate it. But your code is just a sequence of characters — letters, numbers, symbols. How can it be manipulated intelligently?

The AST (Abstract Syntax Tree) is the answer. It’s a structured representation of your code that captures its meaning without being cluttered by the details of its form. Parentheses, spaces, semicolons — all of that disappears. Only the logical structure remains.

For the BP2SC (BP3-to-SuperCollider, see I2 and I3) transpiler (source-to-source code translator), the AST is the core of the system. Each BP3 file (the musical grammar language of the Bol Processor, see I2) is first parsed into a tree of 12 different node types, then this tree is traversed to generate the equivalent SuperCollider (sound synthesis environment, see I3) code.

The idea in one sentence

An AST is a tree-like representation of source code that preserves its logical structure while eliminating non-essential syntactic details.


Let’s explain step by step

Parse tree vs AST: the crucial difference

When a parser analyzes code, it first produces a parse tree (also called a concrete syntax tree) that exactly reflects the grammar rules. This tree contains everything, including purely syntactic elements.

Analogy: the script vs the screenplay

Imagine a play:

  • The complete script includes all stage directions, page numbers, technical indications (“lights at 50%”). This is the parse tree.
  • The cleaned-up screenplay only keeps the dialogues and essential actions. This is the AST.

Both represent the same play, but one is faithful to the form, the other to the substance.

Example with 2 + 3:

Parse tree (derivation tree):

 

        Expr
       / | \
    Expr "+" Term
      |        |
   Term     Factor
      |        |
  Factor      "3"
      |
     "2"

 

This tree contains Expr, Term, Factor nodes that correspond to the grammar rules. It’s faithful, but verbose.

AST (abstract tree):

 

    Add
   /   \
  2     3

 

The AST only keeps the essential: an addition operation with two operands. The intermediate levels have disappeared.

Why this simplification?

  • Simpler manipulation: fewer nodes to traverse
  • Grammar independence: syntax can be changed without modifying the AST
  • Focus on semantics: the tree represents what the code does, not how it is written

Example 1: the expression 2 + 3 * 4

Let’s build the AST of this expression step by step.

The source code: 2 + 3 * 4

The resulting AST:

 

      Add
     /   \
    2    Mul
        /   \
       3     4

 

Important observations:

  1. Precedence is encoded: Mul is a child of Add, so multiplication will be evaluated first
  2. No parentheses: they are not in the tree, but their effect is
  3. Numbers are leaves: they are terminal values

AST evaluation (post-order traversal):

What is a post-order traversal?

A traversal is a systematic way of visiting all nodes in a tree. Post-order means: “children first, then parent”.

Why post-order for evaluation? Because we must know the result of sub-expressions before we can calculate the parent expression. We cannot calculate 2 + (3*4) without first calculating 3*4.

 

1. Visit the Add node
2.   Visit left child → 2
3.   Visit right child (Mul)
4.     Visit left child → 3
5.     Visit right child → 4
6.     Evaluate Mul(3, 4) → 12    <-- children evaluated first
7.   Evaluate Add(2, 12) → 14     <-- parent evaluated after

 

Structure of an AST node

In practice, an AST node is often a class or a data structure:

 

from dataclasses import dataclass
from typing import Union

@dataclass
class Number:
    value: int

@dataclass
class BinOp:
    op: str          # "+", "-", "*", "/"
    left: 'Expr'
    right: 'Expr'

Expr = Union[Number, BinOp]

 

The AST of 2 + 3 * 4 in Python:

 

ast = BinOp(
    op="+",
    left=Number(2),
    right=BinOp(
        op="*",
        left=Number(3),
        right=Number(4)
    )
)

 

Example 2: The BP3 AST

The BP2SC transpiler defines 12 node types to represent all elements of a BP3 file:

Node Represents BP3 Example
BPFile Complete file (root)
GrammarBlock Grammar section ORD[1]
Rule Production rule S --> A B
Weight Rule weight (see B4) <50>, <50-12>
Flag Condition or operation (see B4) /Ideas/, /Ideas-1/
Note Musical note do4, C#5
Rest Rest -
NonTerminal Symbol to expand S, Phrase
Variable Variable between pipes \|x\|
Polymetric Polymetric expression (see B5) {2, A B C}
SpecialFn Special function _transpose(3)
HomoApply Homomorphism (see B6) (= \|x\|)

Concrete example — the BP3 rule:

 

<50> /Ideas/ S --> A _transpose(2) do4 {2, B C}

 

Its AST:

 

Rule
├── weight: Weight(value=50, decrement=None)
├── flags: [Flag(name="Ideas", op="condition")]
├── lhs: [NonTerminal("S")]
└── rhs:
    ├── NonTerminal("A")
    ├── SpecialFn(name="transpose", args=[2])
    ├── Note(name="do", octave=4, accidental=None)
    └── Polymetric(ratio=2, elements=[
        NonTerminal("B"),
        NonTerminal("C")
    ])

 

Each element of the source code corresponds to a typed node in the tree.

AST Traversal

To exploit an AST, we traverse it: we systematically visit each node to perform an operation. Several strategies exist, each adapted to a specific use:

The three depth-first traversal orders

Imagine visiting a multi-story house:

  • Pre-order: note each room upon entering, before exploring its sub-rooms
  • Post-order: note each room upon exiting, after exploring its sub-rooms
  • In-order: for binary trees, alternate left-center-right

Depth-first traversal:

Strategy Visit Order Use Case
Pre-order Node, then children Copy tree, serialize
Post-order Children, then node Evaluate, calculate properties
In-order Left, node, right Display infix expressions

Breadth-first traversal:

  • Visit level by level, from top to bottom
  • Useful for analyses that depend on “depth” in the tree

The Visitor pattern:

What is a design pattern?

A design pattern is a reusable solution to a recurring problem in programming. These are proven “recipes”.

The Visitor pattern solves this problem: “How to add new operations to an AST without modifying the node classes each time?”

Analogy: Imagine a museum (the AST) with artworks (the nodes). Rather than modifying each artwork to add a new feature (audio guide, braille description…), a specialized “visitor” is created that knows how to interact with each type of artwork.

In object-oriented programming, the Visitor pattern allows adding operations without modifying the node classes:

 

class ASTVisitor:
    def visit(self, node):
        method_name = f'visit_{type(node).__name__}'
        visitor = getattr(self, method_name, self.generic_visit)
        return visitor(node)

    def generic_visit(self, node):
        raise NotImplementedError(f"No visitor for {type(node)}")

class Evaluator(ASTVisitor):
    def visit_Number(self, node):
        return node.value

    def visit_BinOp(self, node):
        left = self.visit(node.left)
        right = self.visit(node.right)
        if node.op == '+': return left + right
        if node.op == '*': return left * right
        # ... other operations

# Usage
evaluator = Evaluator()
result = evaluator.visit(ast)  # → 14

 

From AST to target code

The BP2SC transpiler (see B7) uses the AST to generate SuperCollider code. Each node type has a translation rule:

BP3 Node SuperCollider Code
Rule(lhs=S, rhs=[A,B]) Pdef(\S, Pseq([Pdef(\A), Pdef(\B)]))
Note(do, 4) Pbind(\midinote, 60, \dur, 0.25)
Rest Event.silent(0.25)
Polymetric(2, [A,B,C]) Pbindf(Pseq([...]), \stretch, 2/3)
SpecialFn(transpose, 3) Pbindf(..., \mtranspose, 3)

Translation example:

 

// BP3
S --> do4 re4 mi4

// AST
Rule(lhs=[NonTerminal("S")], rhs=[Note("do",4), Note("re",4), Note("mi",4)])

// SuperCollider
Pdef(\S, Pseq([
    Pbind(\midinote, 60, \dur, 0.25),
    Pbind(\midinote, 62, \dur, 0.25),
    Pbind(\midinote, 64, \dur, 0.25)
], 1))

 

Why the AST is central

The AST is the pivot of any compiler or transpiler:

 

Source code → [Lexer] → Tokens → [Parser] → AST → [Emitter] → Target code
                                                 ^
                                                 |
                                          Central point

 

The steps of the compilation pipeline

Step Role Input Output
Lexer Break text into units "2 + 3" [NUM:2, PLUS, NUM:3]
Parser Build the structure List of tokens AST
Emitter Generate target code AST Executable code

>
Lexer (or tokenizer): transforms a string of characters into tokens, the significant units of the language. Example: 123 becomes a NUMBER token, if becomes a KEYWORD token.

Tokens: lexical units of the language (numbers, operators, keywords, identifiers). These are the “words” recognized by the lexer before syntactic analysis.

Advantages of this architecture:

  1. Separation of concerns: the parser handles syntax, the emitter handles generation
  2. Intermediate analyses: the AST can be checked, optimized, transformed before emission
  3. Multi-target: the same AST can generate code for different platforms
  4. Testability: the parser and emitter can be tested independently

Key takeaways

  1. An AST is a tree-like representation of source code, stripped of non-essential syntactic details.
  2. Parse tree vs AST: the parse tree is faithful to the grammar, the AST is faithful to the meaning.
  3. Nodes are typed: each language construct has its node type (Note, Rule, BinOp…).
  4. Traversal enables exploitation: evaluation, code generation, static analysis.
  5. The AST is the central point of any compiler: it decouples syntactic analysis from code generation.
  6. BP2SC uses 12 node types to represent the entirety of BP3 notation.

To go further

  • Python ast module: https://docs.python.org/3/library/ast.html (to explore Python ASTs)
  • AST Explorer: https://astexplorer.net/ (interactive visualization of ASTs for many languages)
  • Dragon Book, Chapter 5: Syntax-Directed Translation
  • Visitor Pattern: https://refactoring.guru/design-patterns/visitor

Glossary

Term Definition
AST Abstract Syntax Tree. A tree-like representation of source code that preserves the logical structure by eliminating non-essential syntactic details (parentheses, decorative commas, etc.).
Design pattern A reusable solution to a recurring problem in software design. The Visitor pattern is a classic example for ASTs.
Emitter A component of a compiler that generates target code (or object code) from the AST. Also called “code generator”.
Leaf A terminal node in the tree, with no children. Represents atomic values: numbers, identifiers, strings.
Lexer The first stage of a compiler. Breaks down source code into tokens (lexical units). Also called “tokenizer” or “lexical analyzer”.
Node An element of the tree representing a language construct. Each node has a type (BinOp, Note, Rule…) and zero or more children.
Parse tree A derivation tree faithful to the grammar rules. More detailed than the AST, it includes all syntactic elements. Also called “concrete syntax tree”.
Traversal Systematic visit of all nodes in a tree according to a defined strategy (pre-order, post-order, in-order, or breadth-first).
Post-order A traversal strategy where all children of a node are visited first, then the node itself. Ideal for expression evaluation.
Pre-order A traversal strategy where the node is visited first, then its children. Ideal for copying or serializing a tree.
Token A lexical unit of the language, produced by the lexer. Examples: number, operator, keyword, identifier. A token has a type and a value.
Visitor A design pattern that allows adding operations to the AST without modifying the node classes. Each type of operation is a distinct “visitor”.

Prerequisites: L2 — Context-Free Grammars, L3 — EBNF
Reading time: 9 min
Tags: #ast #compilation #parsing #trees #transpilation


Next article: L5 — The three semantics: giving meaning to formal languages