PyPy
PyPy[jit-spring06-thinking]

PyPy - Just-In-Time Specialization

Warning: These are just a few notes quickly thrown together, to be clarified and expanded.

Draft

Introduction

Our current "state of the art" in the area is represented by the test pypy.jit.test.test_hint_timeshift.test_arith_plus_minus(). It is a really tiny interpreter which gets turned into a compiler. Here is its complete source:

def ll_plus_minus(encoded_insn, nb_insn, x, y):
    acc = x
    pc = 0
    while pc < nb_insn:
        op = (encoded_insn >> (pc*4)) & 0xF
        op = hint(op, concrete=True)           # <-----
        if op == 0xA:
            acc += y
        elif op == 0x5:
            acc -= y
        pc += 1
    return acc

This interpreter goes via several transformations which you can follow in Pygame:

py.test test_hint_timeshift.py -k test_arith_plus_minus --view

What this does is turning (the graph of) the above interpreter into (the graph of) a compiler. This compiler takes a user program as input -- i.e. an encoded_insn and nb_insn -- and produces as output a new graph, which is the compiled version of the user program. The new output graph is called the residual graph. It takes x and y as input.

This generated compiler is not "just-in-time" in any sense. It is a regular compiler, for now.

Hint-annotator

Let's follow how the interpreter is turned into a compiler. First, the source of the interpreter is turned into low-level graphs in the usual way (it is actually already a low-level function). This low-level graph goes then through a pass called the "hint-annotator", whose goal is to give colors -- red, green and blue -- to each variable in the graph. The color represents the "time" at which the value of a variable is expected to be known, when the interpreter works as a compiler. In the above example, variables like pc and encoded_insn need to be known to the compiler -- otherwise, it wouldn't even know which program it must compile. These variables need to be green. Variables like x and acc, on the other hand, are expected to appear in the residual graph; they need to be red.

The color of each variable is derived based on the hint (see the <----- line in the source): the hint() forces op to be a green variable, along with any previous variable that is essential to compute the value of op. The hint-annotator computes dependencies and back-propagates "greenness" from hint() calls.

The hint-annotator is implemented on top of the normal annotator; it's in hintannotator.py, hintmodel.py, hintbookkeeper.py, hintcontainer.py and hintvlist.py. The latter two files are concerned about the blue variables, which are variable that contain pointers to structures of a mixed kind: structures which are themselves -- as containers -- known to the compiler, i.e. green, but whose fields may not all be known to the compiler. There is no blue variable in the code above, but the stack of a stack-based interpreter is an example: the "shape" of the stack is known to the compiler when compiling any bytecode position, but the actual run-time values in the stack are not. The hint-annotator can now handle many cases of blue structures and arrays. For low-level structures and arrays that actually correspond to RPython lists, hintvlist.py recognize the RPython-level operations and handles them directly -- this avoids problems with low-level details like over-allocation, which causes several identical RPython lists to look different when represented as low-level structs and arrays.

Timeshifter

Once the graph has been colored, enters the "timeshifter". This tool -- loosely based on the normal RTyper -- transforms the colored low-level graphs into the graphs of the compiler. Broadly speaking, this is done by transforming operations annotated with red variables into operations that will generate the original operation. Indeed, red variables are the variables whose run-time content is unknown to the compiler. So for example, if the arguments of an int_add have been annotated as red, it means that the real value of these variables will not be known to the compiler; when the compiler actually runs, all it can do is generate a new int_add operation into the residual graph.

In the example above, only acc += y and acc -= y are annotated with red arguments. After hint-rtyping, the ll_plus_minus() graph -- which is now the graph of a compiler -- is mostly unchanged except for these two operations, which are replaced by a few operations which call helpers; when the graph of the now-compiler is running, these helpers will produce new int_add and int_sub operations.

Merging and bookkeeping

XXX the following is not the way it works currently, but rather a proposal for how it might work -- although it's all open to changes. It is a mix of how Psyco and the Flow Object Space work.

Unlike ll_plus_minus() above, any realistic interpreter needs to handle two complications:

  1. Jumping or looping opcodes. When looping back to a previous bytecode position, the now-compiler must not continue to compile again and again; it must generate a loop in the residual graph as well, and stop compiling.
  2. Conditional branches. Unless the condition is known at compile time, the compiler must generate a branch in the residual graph and compile both paths.

Keep in mind that this is all about how the interpreter is transformed to become a compiler. Unless explicitly specified, I don't speak about the interpreted user program here.

As a reminder, this is handled in the Flow Space as follows:

  1. When a residual operation is about to be generated, we check if the bytecode position closed back to an already-seen position. To do so, for each bytecode position we save a "state". The state remembers the bytecode interpreter's frame state, as a pattern of Variables and Constants; in addition, the state points to the residual block that corresponded to that position.
  2. A branch forces the current block to fork in two "EggBlocks". This makes a tree with a "SpamBlock" at the root and two EggBlock children, which themselves might again have two EggBlock children, and so on. The Flow Space resumes its analysis from the start of the root SpamBlock and explores each branch of this tree in turn. Unexplored branches are remembered for later.

In Psyco:

  1. A state is saved at the beginning of each bytecode, as in the Flow Space. (There are actually memory-saving tricks to avoid saving a state for each and every bytecode.) We close a loop in the residual graph as soon as we reach an already-seen bytecode position, but only if the states are "compatible enough". Indeed, as in the PyPy JIT, Psyco states store more than just Variable/Constant: they store "blue" containers detailing individual field's content. Blue containers of too-different shapes are not "compatible enough". In addition, some Constants can be marked as "fixed" to prevent them from being merged with different Constants and becoming Variables.
  2. Branching in Psyco work as in the Flow Space, with the exception that each condition has got a "likely" and a "less likely" outcome. The compilation follows the "likely" path only. The "unlikely" paths are only explored if at run-time the execution actually reaches them. (When it does, the same trick as in the Flow Space is used: compilation restarts from the root SpamBlock and follows the complete branch of the EggBlock tree.)
  3. A different kind of branching that doesn't occur in the Flow Space: promoting a value from Variable to Constant. This is used e.g. when an indirect function call is about to be performed. A typical example is to call a PyTypeObject's slot based on the type of a PyObject instance. In this case, Psyco considers the PyObject's ob_type field as a Variable, which it turns into a Constant. Conceptually, the current residual block is ended with a "switch" and every time a different run-time value reaches this point, a new case is compiled and added to the switch. (As in 2., the compilation is restarted from the root SpamBlock until it reaches that point again.)

The "tree of EggBlocks" approach doesn't work too well in general. For example, it unrolls loops infinitely if they are not loops in the bytecode but loops in the implementation of a single opcode (we had this problem working on the annotator in Vilnius).

The current preliminary work on the timeshifter turns the interpreter into a compiler that saves its state at _all_ join-points permanently. This makes sure that loops are not unexpectedly unrolled, and that the code that follows an if/else is not duplicated (as it would be in the tree-of-EggBlocks approach). It is also inefficient and perfect to explode the memory usage.

I think we could try to target the following model -- which also has the advantage that simple calls in the interpreter are still simple calls in the compiler, as in Psyco:

  1. We only save the state at one point. We probably need a hint to specify where this point is -- at the beginning of the bytecode interpreter loop. It is easy to know in advance which information needs to be stored in each state: the tuple of green variables is used as a key in a global dict; the content of the red variables is stored as a state under this key. Several states can be stored under the same key if they are not "compatible enough".
  2. Branching: a possible approach would be to try to have the compiler produce a residual graph that has the same shape as the original graph in the interpreter, at least as far as the conditions are unknown at compile-time. (This would remove the branches whose conditions are known at compile time, and unroll all-green loops like the bytecode interpreter loop itself.)
  3. Promoting a Variable to a Constant, or, in colored terms, a red variable to a green one (using a hint which we have not implemented so far): this is the case where we have no choice but suspend the compilation, and wait for execution to provide real values. We will implement this case later, but I mention it now because it seems that there are solutions compatible with this model, including Psyco's (see 3. above).

The motivation to do point 2. differently than in Psyco is that it is both more powerful (no extra unrolling/duplication of code) and closer to what we have already now: the bookkeeping code inserted by the timeshifter in the compiler's graphs. In Psyco it would have been a mess to write that bookkeeping code everywhere by hand, not to mention changing it to experiment with other ideas.

Random notes

An idea to consider: red variables in the compiler could come with a concrete value attached too, which represents a real execution-time value. The compiler would perform concrete operations on it in addition to generating residual operations. In other words, the compiler would also perform directly some interpretation as it goes along. In this way, we can avoid some of the recompilation by using this attached value e.g. as the first switch case in the red-to-green promotions, or as a hint about which outcome of a run-time condition is more likely.