The CLI's VM is a stack based machine: this fact doesn't play nicely with the SSI form the flowgraphs are generated in. At the moment gencli does a literal translation of the SSI statements, allocating a new local variable for each variable of the flowgraph.
For example, consider the following RPython code and the corresponding flowgraph:
def bar(x, y): foo(x+y, x-y) inputargs: x_0 y_0 v0 = int_add(x_0, y_0) v1 = int_sub(x_0, y_0) v2 = directcall((sm foo), v0, v1)
This is the IL code generated by the CLI backend:
.locals init (int32 v0, int32 v1, int32 v2) block0: ldarg 'x_0' ldarg 'y_0' add stloc 'v0' ldarg 'x_0' ldarg 'y_0' sub stloc 'v1' ldloc 'v0' ldloc 'v1' call int32 foo(int32, int32) stloc 'v2'
As you can see, the results of 'add' and 'sub' are stored in v0 and v1, respectively, then v0 and v1 are reloaded onto stack. These store/load is redundant, since the code would work nicely even without them:
.locals init (int32 v2) block0: ldarg 'x_0' ldarg 'y_0' add ldarg 'x_0' ldarg 'y_0' sub call int32 foo(int32, int32) stloc 'v2'
I've checked the native code generated by the Mono Jit on x86 and I've seen that it does not optimize it. I haven't checked the native code generated by Microsoft CLR, yet.
Thus, we might consider to optimize it manually; it should not be so difficult, but it is not trivial becasue we have to make sure that the dropped locals are used only once.
Both RPython and CLI have its own set of exception classes: some of these are pretty similar; e.g., we have OverflowError, ZeroDivisionError and IndexError on the first side and OverflowException, DivideByZeroException and IndexOutOfRangeException on the other side.
The first attempt was to map RPython classes to their corresponding CLI ones: this worked for simple cases, but it would have triggered subtle bugs in more complex ones, because the two exception hierarchies don't completely overlap.
For now I've choosen to build an RPython exception hierarchy completely indipendent from the CLI one, but this means that we can't rely on exceptions raised by standard operations. The currently implemented solution is to do an exception translation on-the-fly; for example, the 'ind_add_ovf' is translated into the following IL code:
.try { ldarg 'x_0' ldarg 'y_0' add.ovf stloc 'v1' leave __check_block_2 } catch [mscorlib]System.OverflowException { newobj instance void class exceptions.OverflowError::.ctor() dup ldsfld class Object_meta pypy.runtime.Constants::exceptions_OverflowError_meta stfld class Object_meta Object::meta throw }
I.e., it catches the builtin OverflowException and raises a RPython OverflowError.
I haven't misured timings yet, but I guess that this machinery brings to some performance penalties even in the non-overflow case; a possible optimization is to do the on-the-fly translation only when it is strictly necessary, i.e. only when the except clause catches an exception class whose subclass hierarchy is compatible with the builtin one. As an example, consider the following RPython code:
try: return mylist[0] except IndexError: return -1
Given that IndexError has no subclasses, we can map it to IndexOutOfBoundException and directly catch this one:
try { ldloc 'mylist' ldc.i4 0 call int32 getitem(MyListType, int32) ... } catch [mscorlib]System.IndexOutOfBoundException { // return -1 ... }
By contrast we can't do so if the except clause catches classes that don't directly map to any builtin class, such as LookupError:
try: return mylist[0] except LookupError: return -1
Has to be translated in the old way:
.try { ldloc 'mylist' ldc.i4 0 .try { call int32 getitem(MyListType, int32) } catch [mscorlib]System.IndexOutOfBoundException { // translate IndexOutOfBoundException into IndexError newobj instance void class exceptions.IndexError::.ctor() dup ldsfld class Object_meta pypy.runtime.Constants::exceptions_IndexError_meta stfld class Object_meta Object::meta throw } ... } .catch exceptions.LookupError { // return -1 ... }
Most methods of RPython lists are implemented by ll_* helpers placed in rpython/rlist.py. For some of those we have a direct correspondent already implemented in .NET List<>; we could use the oopspec attribute for doing an on-the-fly replacement of these low level helpers with their builtin correspondent. As an example the 'append' method is already mapped to pypylib.List.append. Thanks to Armin Rigo for the idea of using oopspec.
The current implementations of ll_dict_getitem and ll_dict_get in ootypesystem.rdict do two consecutive lookups (calling ll_contains and ll_get) on the same key. We might cache the result of pypylib.Dict.ll_contains so that the succesive ll_get don't need a lookup. Btw, we need some profiling before choosing the best way. Or we could directly refactor ootypesystem.rdict for doing a single lookup.
XXX I tried it on revision 32917 and performance are slower! I don't know why, but pypy.net pystone.py is slower by 17%, and pypy.net richards.py is slower by 71% (!!!). I don't know why, need to be investigated further.
2006-10-02, 13:41 <pedronis> antocuni: do you try to not wrap static methods that are just called and not passed around <antocuni> no I think I don't know how to detect them <pedronis> antocuni: you should try to render them just as static methods not as instances when possible you need to track what appears only in direct_calls vs other places
We should try to use native .NET unicode facilities instead of our own. These should save both time (especially startup time) and memory.
On 2006-10-02 I got these benchmarks:
Pypy.NET Startup time Memory used with unicodedata ~12 sec 112508 Kb without unicodedata ~6 sec 79004 Kb
The version without unicodedata is buggy, of course.
Unfortunately it seems that .NET doesn't expose all the things we need, so we will still need some data. For example there is no way to get the unicode name of a char.