Concrete vs. Abstract Syntax

Every programming language has a concrete syntax. Furthermore, every implementation of a programming language uses an abstract syntax. In other words, the concrete syntax is part of the definition of the language. The abstract syntax is part of the definition of a particular implementation (evaluator or compiler) of a language. If the language allows us to access the implementation structures from within itself (such as Scheme, which allows us to treat programs as Scheme data structions; i.e. Scheme lists), then of course, the abstract syntax becomes part of the definition of that language. In Pico this is possible because a function f can be treated as an array of size 4 so that we can access the body code by the expression f[3]. The result of this is a tree which can be further disected by using simple array referencing. Hence, in Pico, the way abstract syntax trees look like is part of the definition of Pico. So, in Pico the abstract syntax is part of the definition of Pico, and not just an implementation detail. "Normally" however, accessing the implementation of the language from within that language is not possible. Then, the abstract syntax is strictly a matter of implementation.

The concrete syntax of a programming language is defined by a context free grammar. It consists of a set of rules (productions) that define the way programs look like to the programmer. The concrete syntax of Pico is defined by the productions in the file ConcrSyn.pco.
The abstract syntax of an implementation is the set of trees used to represent programs in the implementation. This is, the abstract syntax defines the way the programs look like to the evaluator/compiler. The abstract syntax of the metacircular Pico interpreter is defined by the composition rules given in the file AbstrSyn.pco.

The relation between the abstract and concrete syntax is accomplished by a program called the reader in what is usually called the read phase of the classical read-eval-print-loop. The reader takes a piece of text that is expected to meet the rules of the concrete syntax. If the text does not meet the definition of the concrete syntax, an error is generated. Otherwise, the reader transforms the text meeting the concrete syntax into a tree meeting the definition of the abstract syntax. This tree (the so called parse tree) represents the structure of the program and no longer contains surface syntax like parenthesis. The reader for pico consists of two files representing the scanner (or lexical analyzer) and the parser (or structural analyzer). The scanner for Pico is in Scan.pco. The parser is in Read.pco. The scanner transformes the program text into a stream of 'tokens' such that all irrelevant information such as whitespaces and carriage returns have been removed. The parser consumes these tokens and tries to find the structure between the tokens in order to build up the parse tree.

Back to the metacircular evaluator

This page was made (with lots of hard work!) by Wolfgang De Meuter