Some parts of the annotator, and especially specialization, are quite obscure and hackish. One cause for this is the need to manipulate Python objects like functions directly. This makes it hard to attach additional information directly to the objects. It makes specialization messy because it has to create new dummy function objects just to represent the various specialized versions of the function.
Let's introduce nice wrapper objects. This refactoring is oriented towards the following goal: replacing the content of SomePBC() with a plain set of "description" wrapper objects. We shall probably also remove the possibility for None to explicitly be in the set and add a can_be_None flag (this is closer to what the other SomeXxx classes do).
To be declared in module pypy.annotator.desc, with a mapping annotator.bookkeeper.descs = {<python object>: <XxxDesc instance>} accessed with bookkeepeer.getdesc(<python object>).
Maybe later the module should be moved out of pypy.annotation but for now I suppose that it's the best place.
The goal is to have a single Desc wrapper even for functions and classes that are specialized.
FunctionDesc
Describes (usually) a Python function object. Contains flow graphs: one in the common case, zero for external functions, more than one if there are several specialized versions. Also describes the signature of the function in a nice format (i.e. not by relying on func_code inspection).
ClassDesc
Describes a Python class object. Generally just maps to a ClassDef, but could map to more than one in the presence of specialization. So we get SomePBC({<ClassDesc>}) annotations for the class, and when it's instantiated it becomes SomeInstance(classdef=...) for the particular selected classdef.
MethodDesc
Describes a bound method. Just references a FunctionDesc and a ClassDef (not a ClassDesc, because it's read out of a SomeInstance).
FrozenDesc
Describes a frozen pre-built instance. That's also a good place to store some information currently in dictionaries of the bookkeeper.
MethodOfFrozenDesc
Describes a method of a FrozenDesc. Just references a FunctionDesc and a FrozenDesc.
NB: unbound method objects are the same as function for our purposes, so they become the same FunctionDesc as their im_func.
These XxxDesc classes should share some common interface, as we'll see during the refactoring. A common base class might be a good idea (at least I don't see why it would be a bad idea :-)
Done, branch merged.
The FuncDesc.specialize() method takes an args_s and return a corresponding graph. The caller of specialize() parses the actual arguments provided by the simple_call or call_args operation, so that args_s is a flat parsed list. The returned graph must have the same number and order of input variables.
For each call family, we compute a table like this (after annotation finished):
call_shape FuncDesc1 FuncDesc2 FuncDesc3 ... ---------------------------------------------------------- call0 shape1 graph1 call1 shape1 graph1 graph2 call2 shape1 graph3 graph4 call3 shape2 graph5 graph6
We then need to merge some of the lines if they look similar enough, e.g. call0 and call1. Precisely, we can merge two lines if they only differ in having more or less holes. In theory, the same graph could appear in two lines that are still not mergeable because of other graphs. For sanity of implementation, we should check that at the end each graph only appears once in the table (unless there is only one column, in which case all problems can be dealt with at call sites).
(Note that before this refactoring, the code was essentially requiring that the table ended up with either one single row or one single column.)
The table is computed when the annotation is complete, in compute_at_fixpoint(), which calls the FuncDesc's consider_call_site() for each call site. The latter merges lines as soon as possible. The table is attached to the call family, grouped by call shape.
During RTyping, compute_at_fixpoint() is called after each new ll helper is annotated. Normally, this should not modify existing tables too much, but in some situations it will. So the rule is that consider_call_site() should not add new (unmerged) rows to the table after the table is considered "finished" (again, unless there is only one column, in which case we should not discover new columns).
XXX this is now out of date, in the details at least.
The above picture attaches "calltable" information to the call families containing the function. When it comes to rtyping a call of another kind of pbc (class, instance-method, frozenpbc-method) we have two basic choices:
- associate the calltable information with the funcdesc that ultimately ends up getting called, or
- attach the calltable to the callfamily that contains the desc that's actually being called.
Neither is totally straightforward: the former is closer to what happens on the trunk but new families of funcdescs need to be created at the end of annotation or by normalisation. The latter is more of a change. The former is also perhaps a bit unnatural for ootyped backends.