This file contains various notes about the design of the compiler.

OUTLINE

The main job of the compiler is to translate Mercury into C, although it can also translate (subsets of) Mercury to some other languages (Goedel, the bytecode of a debugger currently under development, and in the future the Aditi Relational Language).

The top-level of the compiler is in the file mercury_compile.m. The basic design is that compilation is broken into the following stages:

  1. parsing (source files -> HLDS)
  2. semantic analysis and error checking (HLDS -> annotated HLDS)
  3. high-level transformations (annotated HLDS -> annotated HLDS)
  4. code generation (annotated HLDS -> LLDS)
  5. low-level optimizations (LLDS -> LLDS)
  6. output C code (LLDS -> C)

Note that in reality the separation is not quite as simple as that. Although parsing is listed as step 1 and semantic analysis is listed as step 2, the last stage of parsing actually includes some semantic checks. And although optimization is listed as steps 3 and 5, it also occurs in steps 2, 4, and 6. For example, elimination of assignments to dead variables is done in mode analysis; middle-recursion optimization and the use of static constants for ground terms is done in code generation; and a few low-level optimizations are done in llds_out.m as we are spitting out the C code.


DETAILED DESIGN

(well, more detailed than the OUTLINE anyway ;-)

The action is co-ordinated from mercury_compile.m.

0. Option handling

The command-line options are defined in the module options.m. mercury_compile.m calls library/getopt.m, passing the predicates defined in options.m as arguments, to parse them. It then invokes handle_options.m to postprocess the option set. The results are stored in the io__state, using the type globals defined in globals.m.

1. Parsing

The result at this stage is the High Level Data Structure, which is defined in four files:

  1. hlds_data.m defines the parts of the HLDS concerned with function symbols, types, insts, modes and determinisms;
  2. hlds_goal.m defines the part of the HLDS concerned with the structure of goals, including the annotations on goals;
  3. hlds_pred.m defines the part of the HLDS concerning predicates and procedures;
  4. hlds_module.m defines the top-level parts of the HLDS, including the type module_info.
The module hlds_out.m contains predicates to dump the HLDS to a file. The module goal_util.m contains predicates for renaming variables in an HLDS goal.

2. Semantic analysis and error checking

implicit quantification
quantification.m handles implicit quantification and computes the set of non-local variables for each sub-goal
type checking
mode analysis
indexing and determinism analysis
checking of unique modes (unique_modes.m)
unique_modes.m checks that non-backtrackable unique modes were not used in a context which might require backtracking. Note that what unique_modes.m does is quite similar to what modes.m does, and unique_modes calls lots of predicates defined in modes.m to do it.
simplification (simplify.m)
simplify.m finds and exploits opportunities for simplifying the internal form of the program, both to optimize the code and to massage the code into a form the code generator will accept. It also warns the programmer about any constructs that are so simple that they should not have been included in the program in the first place. simplify.m calls common.m which looks for (a) construction unifications that construct a term that is the same as one that already exists, or (b) repeated calls to a predicate with the same inputs, and replaces them with assignment unifications. simplify.m also attempts to partially evaluate calls to builtin procedures if the inputs are all constants (see const_prop.m).

3. High-level transformations

The first two passes of this stage are code simplifications.

To improve efficiency, the above two passes are actually combined into one - polymorphism.m calls calls lambda__transform_lambda directly.

Most of the remaining HLDS-to-HLDS transformations are optimizations:

The module transform.m contains stuff that is supposed to be useful for high-level optimizations (but which is not yet used).

Eventually we plan to make Mercury the programming language of the Aditi deductive database system. When this happens, we will need to be able to apply the magic set transformation, which is defined for predicates whose definitions are disjunctive normal form. The module dnf.m translates definitions into DNF, introducing auxiliary predicates as necessary.

4. Code generation

pre-passes to annotate the HLDS
Before code generation there are a few more passes which annotate the HLDS with information used for code generation:
choosing registers for procedure arguments (arg_info.m)
Currently uses one of two simple algorithms, but we may add other algorithms later.
annotation of goals with liveness information (liveness.m)
This records the birth and death of each variable in the HLDS goal_info.
allocation of stack slots
This is done by live_vars.m, which works out which variables need to be saved on the stack when, and then uses graph_colour.m to determine a good allocation of variables to stack slots.
migration of builtins following branched structures
This transformation, which is performed by follow_code.m, improves the results of follow_vars.
allocating the follow vars (follow_vars.m)
Traverses backwards over the HLDS, annotating some goals with information about what locations variables will be needed in next. This allows us to generate more efficient code by putting variables in the right spot directly. This module is not called from mercury_compile.m; it is called from store_alloc.m.
allocating the store map (store_alloc.m)
Annotates each branched goal with variable location information so that we can generate correct code by putting variables in the same spot at the end of each branch.
code generation
For code generation itself, the main module is code_gen.m. It handles conjunctions and negations, but calls sub-modules to do most of the other work:

It also calls middle_rec.m to do middle recursion optimization.

The code generation modules make use of

code_info.m
The main data structure for the code generator
code_exprn.m
This defines the exprn_info type, which is a sub-component of the code_info data structure which holds the information about the contents of registers and the values/locations of variables.
exprn_aux.m
Various preds which use exprn_info
code_util.m
Some miscellaneous preds used for code generation
code_aux.m
Some miscellaneous preds which, unlike those in code_util, use code_info
continuation_info.m
For accurate garbage collection, collects information about each live value after calls, and saves information about procedures.

The result of code generation is the Low Level Data Structure (llds.m). The code is generated as a tree of code fragments which is then flattened (tree.m).

5. Low-level optimization

The various LLDS-to-LLDS optimizations are invoked from optimize.m. They are:

Depending on which optimization flags are enabled, optimize.m may invoke many of these passes multiple times.

Some of the low-level optimization passes use opt_util.m, which contains miscellaneous predicates for LLDS-to-LLDS optimization.

6. Output C code


BYTECODE

The Mercury compiler can translate Mercury programs into bytecode for interpretation by the Mercury debugger currently under development. The generation of bytecode happens after semantic checks have been completed.


MISCELLANEOUS

det_util:
This module contains utility predicates needed by the parts of the semantic analyzer and optimizer concerned with determinism.
special_pred.m, unify_proc.m:
These modules contain stuff for handling the special compiler-generated predicates which are generated for each type: unify/2, compare/3, index/1 (used in the implementation of compare/3), and also type_to_term/2 and term_to_type/2 (but those last two are disabled at the moment).
dependency_graph.m:
This contains predicates to compute the call graph for a module, and to print it out to a file. (The call graph file is used by the profiler.) The call graph may eventually also be used by det_analysis.m, inlining.m, and other parts of the compiler which could benefit from traversing the predicates in a module in a bottom-up or top-down fashion with respect to the call graph.
passes_aux.m
Contains code to write progress messages, and higher-order code to traverse all the predicates defined in the current module and do something with each one.
opt_debug.m:
Utility routines for debugging the LLDS-to-LLDS optimizations.

CURRENTLY USELESS

The following modules do not serve any function at the moment. Some of them are obsolete; other are work-in-progress. (For some of them its hard to say which!)

lco.m:
This finds predicates whose implementations would benefit from last call optimization modulo constructor application. It does not apply the optimization and will not until the mode system is capable of expressing definite aliasing.
mercury_to_goedel.m:
This converts from item_list to Goedel source code. It works for simple programs, but doesn't handle various Mercury constructs such as lambda expressions, higher-order predicates, and functor overloading.
mercury_to_c.m:
The very incomplete beginnings of an alternate code generator. When finished, it will convert HLDS to high-level C code (without going via LLDS).

Last update was $Date: 1997/09/23 16:48:43 $ by $Author: fjh $@cs.mu.oz.au.