4 The Abstract Format
This document describes the standard representation of parse trees for Erlang programs as Erlang terms. This representation is known as the abstract format. Functions dealing with such parse trees are
compile:forms/[1,2]
and functions in the modulesepp
,erl_eval
,erl_lint
,erl_pp
,erl_parse
, andio
. They are also used as input and output for parse transforms (see the modulecompile
).We use the function
Rep
to denote the mapping from an Erlang source constructC
to its abstract format representationR
, and writeR = Rep(C)
.The word
LINE
below represents an integer, and denotes the number of the line in the source file where the construction occurred. Several instances ofLINE
in the same construction may denote different lines.Since operators are not terms in their own right, when operators are mentioned below, the representation of an operator should be taken to be the atom with a printname consisting of the same characters as the operator.
4.1 Module declarations and forms
A module declaration consists of a sequence of forms that are either function declarations or attributes.
- If D is a module declaration consisting of the forms
F_1
, ...,F_k
, then Rep(D) =[Rep(F_1), ..., Rep(F_k)]
.
- If F is an attribute
-module(Mod)
, then Rep(F) ={attribute,LINE,module,Mod}
.
- If F is an attribute
-export([Fun_1/A_1, ..., Fun_k/A_k])
, then Rep(F) ={attribute,LINE,export,[{Fun_1,A_1}, ..., {Fun_k,A_k}]}
.
- If F is an attribute
-import(Mod,[Fun_1/A_1, ..., Fun_k/A_k])
, then Rep(F) ={attribute,LINE,import,{Mod,[{Fun_1,A_1}, ..., {Fun_k,A_k}]}}
.
- If F is an attribute
-compile(Options)
, then Rep(F) ={attribute,LINE,compile,Options}
.
- If F is an attribute
-file(File,Line)
, then Rep(F) ={attribute,LINE,file,{File,Line}}
.
- If F is a record declaration
-record(Name,{V_1, ..., V_k})
, then Rep(F) ={attribute,LINE,record,{Name,[Rep(V_1), ..., Rep(V_k)]}}
. For Rep(V), see below.
- If F is a wild attribute
-A(T)
, then Rep(F) ={attribute,LINE,A,T}
.
- If F is a function declaration
Name(Ps_1) when Gs_1 -> B_1 ; ... ; Name(Ps_k) when Gs_k -> B_k end
, where eachPs_i
,Gs_i
andB_i
is a pattern sequence, a guard sequence and a body, respectively, and eachPs_i
has the same lengthArity
, then Rep(F) ={function,LINE,Name,Arity, [{clause,LINE,Rep(Ps_1),Rep(Gs_1),Rep(B_1)}, ..., {clause,LINE,Rep(Ps_k),Rep(Gs_k),Rep(B_k)}]}
.
4.1.1 Record fields
Each field in a record declaration may have an optional explicit default initializer expression
- If V is
A
, then Rep(V) ={record_field,LINE,Rep(A)}
.
- If V is
A = E
, then Rep(V) ={record_field,LINE,Rep(A),Rep(E)}
.
4.1.2 Representation of parse errors and end of file
In addition to the representations of forms, the list that represents a module declaration (as returned by functions in
erl_parse
andepp
) may contain tuples{error,E}
, denoting syntactically incorrect forms, and{eof,LINE}
, denoting an end of stream encountered before a complete form had been parsed.4.2 Atomic literals
There are five kinds of atomic literals, which are represented in the same way in patterns, expressions and guard expressions:
- If L is an integer or character literal, then Rep(L) =
{integer,LINE,L}
.
- If L is a float literal, then Rep(L) =
{float,LINE,L}
.
- If L is a string literal consisting of the characters
C_1
, ...,C_k
, then Rep(L) ={string,LINE,[C_1, ..., C_k]}
.
- If L is an atom literal, then Rep(L) =
{atom,LINE,L}
.
Note that negative integer and float literals do not occur as such; they are parsed as an application of the unary negation operator.
4.3 Patterns
If
Ps
is a sequence of patternsP_1, ..., P_k
, then Rep(Ps) =[Rep(P_1), ..., Rep(P_k)]
. Such sequences occur as the list of arguments to a function or fun.Individual patterns are represented as follows:
- If P is an atomic literal L, then Rep(P) = Rep(L),
- If P is a compound pattern
P_1 = P_2
, then Rep(P) ={match,LINE,Rep(P_1),Rep(P_2)}
.
- If P is a variable pattern
V
, then Rep(P) ={var,LINE,A}
, where A is an atom with a printname consisting of the same characters asV
.
- If P is a universal pattern
_
, then Rep(P) ={var,LINE,'_'}
.
- If P is a tuple pattern
{P_1, ..., P_k}
, then Rep(P) ={tuple,LINE,[Rep(P_1), ..., Rep(P_k)]}
.
- If P is a nil pattern
[]
, then Rep(P) ={nil,LINE}
.
- If P is a cons pattern
[P_h | P_t]
, then Rep(P) ={cons,LINE,Rep(P_h),Rep(P_t)}
.
- If E is a binary pattern
<<P_1:Size_1/TSL_1, ..., P_k:Size_k/TSL_k>>
, then Rep(E) ={bin,LINE,[{bin_element,LINE,Rep(P_1),Rep(Size_1),Rep(TSL_1)}, ..., {bin_element,LINE,Rep(P_k),Rep(Size_k),Rep(TSL_k)}]}
. For Rep(TSL), see below. An omittedSize
is represented bydefault
. An omittedTSL
(type specifier list) is represented bydefault
.
- If P is
P_1 Op P_2
, whereOp
is a binary operator (this is either an occurrence of++
applied to a literal string or character list, or an occurrence of an expression that can be evaluated to a number at compile time), then Rep(P) ={op,LINE,Op,Rep(P_1),Rep(P_2)}
.
- If P is
Op P_0
, whereOp
is a unary operator (this is an occurrence of an expression that can be evaluated to a number at compile time), then Rep(P) ={op,LINE,Op,Rep(P_0)}
.
- If P is a record pattern
#Name{Field_1=P_1, ..., Field_k=P_k}
, then Rep(P) ={record,LINE,Name, [{record_field,LINE,Rep(Field_1),Rep(P_1)}, ..., {record_field,LINE,Rep(Field_k),Rep(P_k)}]}
.
Note that every pattern has the same source form as some expression, and is represented the same way as the corresponding expression.
4.4 Expressions
A body B is a sequence of expressions
E_1, ..., E_k
, and Rep(B) =[Rep(E_1), ..., Rep(E_k)]
.An expression E is one of the following alternatives:
- If P is an atomic literal
L
, then Rep(P) = Rep(L).
- If E is
P = E_0
, then Rep(E) ={match,LINE,Rep(P),Rep(E_0)}
.
- If E is a variable
V
, then Rep(E) ={var,LINE,A}
, whereA
is an atom with a printname consisting of the same characters asV
.
- If E is a tuple skeleton
{E_1, ..., E_k}
, then Rep(E) ={tuple,LINE,[Rep(E_1), ..., Rep(E_k)]}
.
- If E is
[]
, then Rep(E) ={nil,LINE}
.
- If E is a cons skeleton
[E_h | E_t]
, then Rep(E) ={cons,LINE,Rep(E_h),Rep(E_t)}
.
- If E is a binary constructor
<<V_1:Size_1/TSL_1, ..., V_k:Size_k/TSL_k>>
, then Rep(E) ={bin,LINE,[{bin_element,LINE,Rep(V_1),Rep(Size_1),Rep(TSL_1)}, ..., {bin_element,LINE,Rep(V_k),Rep(Size_k),Rep(TSL_k)}]}
. For Rep(TSL), see below. An omittedSize
is represented bydefault
. An omittedTSL
(type specifier list) is represented bydefault
.
- If E is
E_1 Op E_2
, whereOp
is a binary operator, then Rep(E) ={op,LINE,Op,Rep(E_1),Rep(E_2)}
.
- If E is
Op E_0
, whereOp
is a unary operator, then Rep(E) ={op,LINE,Op,Rep(E_0)}
.
- If E is
#Name{Field_1=E_1, ..., Field_k=E_k}
, then Rep(E) ={record,LINE,Name, [{record_field,LINE,Rep(Field_1),Rep(E_1)}, ..., {record_field,LINE,Rep(Field_k),Rep(E_k)}]}
.
- If E is
E_0#Name{Field_1=E_1, ..., Field_k=E_k}
, then Rep(E) ={record,LINE,Rep(E_0),Name, [{record_field,LINE,Rep(Field_1),Rep(E_1)}, ..., {record_field,LINE,Rep(Field_k),Rep(E_k)}]}
.
- If E is
#Name.Field
, then Rep(E) ={record_index,LINE,Name,Rep(Field)}
.
- If E is
E_0#Name.Field
, then Rep(E) ={record_field,LINE,Rep(E_0),Name,Rep(Field)}
.
- If E is
catch E_0
, then Rep(E) ={'catch',LINE,Rep(E_0)}
.
- If E is
E_0(E_1, ..., E_k)
, then Rep(E) ={call,LINE,Rep(E_0),[Rep(E_1), ..., Rep(E_k)]}
.
- If E is
E_m:E_0(E_1, ..., E_k)
, then Rep(E) ={call,LINE,{remote,LINE,Rep(E_m),Rep(E_0)},[Rep(E_1), ..., Rep(E_k)]}
.
- If E is a list comprehension
[E_0 || W_1, ..., W_k]
, where eachW_i
is a generator or a filter, then Rep(E) ={lc,LINE,Rep(E_0),[Rep(W_1), ..., Rep(W_k)]}
. For Rep(W), see below.
- If E is
begin B end
, whereB
is a body, then Rep(E) ={block,LINE,Rep(B)}
.
- If E is
if Gs_1 -> B_1 ; ... ; Gs_k -> B_k end
, where eachGs_i
andB_i
is a guard sequence and a body, respectively, then Rep(E) ={'if',LINE,[{clause,LINE,[],Rep(Gs_1),Rep(B_1)}, ..., {clause,LINE,[],Rep(Gs_k),Rep(B_k)}]}
.
- If E is
case E_0 of P_1 when Gs_1 -> B_1 ; ... ; P_k when Gs_k -> B_k end
, whereE_0
is an expression and eachP_i
,Gs_i
andB_i
is a pattern, a guard sequence and a body, respectively, then Rep(E) ={'case',LINE,Rep(E_0), [{clause,LINE,[Rep(P_1)],Rep(Gs_1),Rep(B_1)}, ..., {clause,LINE,[Rep(P_k)],Rep(Gs_k),Rep(B_k)}]}
.
- If E is
try B_t catch CP_1 when CGs_1 -> CB_1 ; ... ; CP_k when CGs_k -> CB_k end
, whereB_t
is a body, eachCGs_i
,CB_i
andCP_i
is a guard sequence, a body and a pattern, respectively, then Rep(E) ={'try',LINE,Rep(B_t),[], [{clause,LINE,[Rep(CP_1)],Rep(CGs_1),Rep(CB_1)}, ..., {clause,LINE,[Rep(CP_k)],Rep(CGs_k),Rep(CB_k)}]}
.
- If E is
try B_t of P_1 when Gs_1 -> B_1 ; ... ; P_k when Gs_k -> B_k catch CP_1 when CGs_1 -> CB_1 ; ... ; CP_k when CGs_k -> CB_k end
, whereB_t
is a body, eachGs_i
,B_i
andP_i
is a guard sequence, a body and a pattern, respectively, and eachCGs_i
,CB_i
andCP_i
is a guard sequence, a body and a pattern, respectively, then Rep(E) ={'try',LINE,Rep(B_t), [{clause,LINE,[Rep(P_1)],Rep(Gs_1),Rep(B_1)}, ..., {clause,LINE,[Rep(P_k)],Rep(Gs_k),Rep(B_k)}], [{clause,LINE,[Rep(CP_1)],Rep(CGs_1),Rep(CB_1)}, ..., {clause,LINE,[Rep(CP_k)],Rep(CGs_k),Rep(CB_k)}]}
.
- If E is
receive P_1 when Gs_1 -> B_1 ; ... ; P_k when Gs_k -> B_k end
, where eachP_i
,Gs_i
andB_i
is a pattern, a guard sequence and a body, respectively, then Rep(E) ={'receive',LINE, [{clause,LINE,[Rep(P_1)],Rep(Gs_1),Rep(B_1)}, ..., {clause,LINE,[Rep(P_k)],Rep(Gs_k),Rep(B_k)}]}
.
- If E is
receive P_1 when Gs_1 -> B_1 ; ... ; P_k when Gs_k -> B_k after E_0 -> B_t end
, where eachP_i
,Gs_i
andB_i
is a pattern, a guard sequence and a body, respectively,E_0
is an expression andB_t
is a body, then Rep(E) ={'receive',LINE, [{clause,LINE,[Rep(P_1)],Rep(Gs_1),Rep(B_1)}, ..., {clause,LINE,[Rep(P_k)],Rep(Gs_k),Rep(B_k)}], Rep(E_0),Rep(B_t)}
.
- If E is
fun Name/Arity
, then Rep(E) ={'fun',LINE,{function,Name,Arity}}
.
- If E is
fun Ps_1 when Gs_1 -> B_1 ; ... ; Ps_k when Gs_k -> B_k end
, where eachPs_i
,Gs_i
andB_i
is a pattern sequence, a guard sequence and a body, respectively, then Rep(E) ={'fun',LINE,{clauses, [{clause,LINE,[Rep(Ps_1)],Rep(Gs_1),Rep(B_1)}, ..., {clause,LINE,[Rep(Ps_k)],Rep(Gs_k),Rep(B_k)}]}}
.
- If E is
query [E_0 || W_1, ..., W_k] end
, where eachW_i
is a generator or a filter, then Rep(E) ={'query',LINE,{lc,LINE,Rep(E_0),[Rep(W_1), ..., Rep(W_k)]}}
. For Rep(W), see below.
- If E is
E_0.Field
, a Mnesia record access inside a query, then Rep(E) ={record_field,LINE,Rep(E_0),Rep(Field)}
.
- If E is
( E_0 )
, then Rep(E) =Rep(E_0)
, i.e., parenthesized expressions cannot be distinguished from their bodies.
4.4.1 Generators and filters
When W is a generator or a filter (in the body of a list comprehension), then:
- If W is a generator
P <- E
, whereP
is a pattern andE
is an expression, then Rep(W) ={generate,LINE,Rep(P),Rep(E)}
.
- If W is a filter
E
, which is an expression, then Rep(W) =Rep(E)
.
4.4.2 Binary element type specifiers
A type specifier list TSL for a binary element is a sequence of type specifiers
TS_1 - ... - TS_k
. Rep(TSL) =[Rep(TS_1), ..., Rep(TS_k)]
.When TS is a type specifier for a binary element, then:
- If TS is an atom
A
, Rep(TS) =A
.
- If TS is a couple
A:Value
whereA
is an atom andValue
is an integer, Rep(TS) ={A, Value}
.
4.5 Guards
A guard Gs is a nonempty sequence of guard tests
G_1, ..., G_k
, and Rep(Gs) =[Rep(G_1), ..., Rep(G_k)]
.A guard sequence Gss is a sequence of guards
Gs_1; ...; Gs_k
, and Rep(Gss) =[Rep(Gs_1), ..., Rep(Gs_k)]
. If the guard sequence is empty, Rep(Gss) =[]
.A guard test G is either
true
, an application of a BIF to a sequence of guard expressions (syntactically this includes guard record tests), or a binary operator applied to two guard expressions.
- If G is
true
, then Rep(G) ={atom,LINE,true}
.
- If G is an application
A(E_1, ..., E_k)
, whereA
is an atom andE_1
, ...,E_k
are guard expressions, then Rep(G) ={call,LINE,{atom,LINE,A},[Rep(E_1), ..., Rep(E_k)]}
.
- If G is an operator expression
E_1 Op E_2
, whereOp
is a binary operator, andE_1
,E_2
are guard expressions, then Rep(G) ={op,LINE,Op,Rep(E_1),Rep(E_2)}
.
All guard expressions are expressions and are represented in the same way as the corresponding expressions.
4.6 The abstract format after preprocessing
When Erlang source code is compiled, the abstract code, after some preprocessing, is stored as the
abstract_code
chunk in the BEAM file, for debugging purposes. The version of the preprocessed format in OTP R7 is calledabstract_v1
, in R8abstract_v2
. The preprocessing changes the representation so it becomes slightly incompatible with the format described above. The differences are:
- BIF calls in guards are translated to the
{remote, ...}
form (which is not allowed in source form).
- Explicit funs are translated to a tuple with an extra element (new in R7):
{'fun',LINE,{clauses, Clauses},Extra}
. The form of this extra element may change from one OTP release to the next.
- Implicit funs are translated to a tuple with an extra element (new in R8):
{'fun',LINE,{function,Name,Arity},Extra}
.