Concrete vs. Abstract Syntax
Every programming language has a concrete syntax. Furthermore, every
implementation of a programming language uses an abstract syntax. In other
words, the concrete syntax is part of the definition of the language. The
abstract syntax is part of the definition of a particular implementation
(evaluator or compiler) of a language. If the language allows us to access the
implementation structures from within itself (such as Scheme, which allows us to
treat programs as Scheme data structions; i.e. Scheme lists), then of course, the
abstract syntax becomes part of the definition of that language. In Pico this is
possible because a function f
can be treated as an array of size 4 so that
we can access the body code by the expression f[3]
. The result of this is
a tree which can be further disected by using simple array referencing. Hence, in Pico,
the way abstract syntax trees look like is part of the definition of Pico. So, in Pico the abstract syntax
is part of the definition of Pico, and not just an implementation detail. "Normally" however,
accessing the implementation of the language from within that language is not possible. Then,
the abstract syntax is strictly a matter of implementation.
- The concrete syntax of a programming language is defined by a context
free grammar. It consists of a set of rules (productions) that define the
way programs look like to the programmer. The concrete syntax of Pico
is defined by the productions in the file
ConcrSyn.pco
.
- The abstract syntax of an implementation is the set of trees used
to represent programs in the implementation. This is, the abstract syntax
defines the way the programs look like to the evaluator/compiler. The
abstract syntax of the metacircular Pico interpreter is defined by the composition
rules given in the file
AbstrSyn.pco
.
The relation between the abstract and concrete syntax is accomplished by a program
called the reader in what is usually called the read phase of the
classical read-eval-print-loop. The reader takes a piece of text that is expected
to meet the rules of the concrete syntax. If the text does not meet the definition
of the concrete syntax, an error is generated. Otherwise, the reader transforms
the text meeting the concrete syntax into a tree meeting the definition of the
abstract syntax. This tree (the so called parse tree) represents the structure
of the program and no longer contains surface syntax like parenthesis. The reader
for pico consists of two files representing the scanner (or lexical analyzer)
and the parser (or structural analyzer). The scanner for Pico is in Scan.pco
. The parser is in Read.pco
. The scanner transformes the
program text into a stream of 'tokens' such that all irrelevant information such
as whitespaces and carriage returns have been removed. The parser consumes these
tokens and tries to find the structure between the tokens in order to build up
the parse tree.
Back to the metacircular evaluator
This page was made (with lots of hard work!) by Wolfgang De Meuter