PyPy is both:
- a reimplementation of Python in Python, and
- a framework for implementing interpreters and virtual machines for programming languages, especially dynamic languages.
PyPy tries to find new answers about ease of creation, flexibility, maintainability and speed trade-offs for language implementations. For further details see our goal and architecture document .
Not completely, at least not yet.
The mostly likely stumbling block for any given project is support for extension modules. PyPy supports a small but continually growing number of extension modules, so far mostly those found in the standard library. The threading support is also not perfectly complete.
The language features (including builtin types and functions) are very complete and well tested, so if your project does not use many extension modules there is a good chance that it will work with PyPy.
PyPy is regularly and extensively tested on Linux machines and on Mac OS X and mostly works under Windows too (but is tested there less extensively). PyPy needs a CPython running on the target platform to bootstrap, as cross compilation is not really meant to work yet. At the moment you need CPython 2.4 for the translation process, 2.5 is not fully supported.
Currently (due to time restrictions) we are not trying hard to support PyPy in a 64 bit environment. While things seem to mostly work, a few modules won't work on 64 bit machines, such as bz2.
PyPy currently aims to be fully compatible with Python 2.4. That means that it contains the standard library of Python 2.4 and that it supports 2.4 features (such as decorators) but not the 2.5 features (with statement, the ternary operator). The 2.5 features will probably be eventually supported, the most important reason why nobody is working on them is that we did not promise this to the EU and have currently enough other tasks.
Operating system-level threads work in a limited way. If you enable the thread module then PyPy will get support for GIL based threading. One limitation is that not that many IO operations actually release the GIL, which reduces the usefulness of threads. On the other hand, PyPy fully supports stackless-like microthreads (although both cannot be mixed yet).
As for other modules: The general rule of thumb is that pure-Python modules work, C extension modules don't. Some of the C extension modules of the standard library have been re-implemented in pure Python or as a mixed module (for some there were also older pure-Python versions available). A (probably incomplete) list:
- pure Python implementations: binascii, cmath, collections, cPickle, cStringIO, datetime, functional, imp, itertools, md5, operator, sha, struct
- mixed module implementations: exceptions, sys, __builtin__, posix _codecs, gc, _weakref, array, marshal, errno, math, _sre, parser, symbol, _random, socket, unicodedata, mmap, fcntl, time, select, bz2, crypt, signal, readline (incomplete)
No and there are no short-term plans to support this. CPython extension modules rely heavily on CPython's C API which contains a lot of implementation details like reference counting, exact C-level object implementation and layout etc.
Although if your module uses ctypes rather than C-level code, there is a hope -- you can try to write a mixed module (see next question).
The long-term answer might be different. In principle, it should be possible for PyPy to support the CPython C API (or at least a large subset of its official part). It means that "in the fullness of time" you might be able to simply recompile existing CPython extension modules and use them with PyPy.
PyPy extension modules are written in the form of mixed modules, so called because they can contain a mixture of compiled and interpreted Python code. At the moment they all need to be translated together with the rest of PyPy.
We have a proof concept of what we call the extension compiler and our support for a static variant of the ctypes interface (rctypes) to help with their development. At the moment both have quite some rough edges. This kind of module can even be cross-compiled to regular CPython extension modules, although this doesn't deliver completely satisfying results so far. This area is going to improve over time.
As of August 2005, PyPy was successfully translated to C. Compared to CPython, the version of PyPy that still runs on top of CPython is slower by a factor of 2000. The first translated version was roughly 300 times slower than CPython, a number which we decreased release after release to the current point, where PyPy is only between 1.7 and 4 times slower than CPython. Note that the speed heavily depends on the options enabled at compile time.
The integration of the work on the Just-In-Time compiler has just started; it can be manually enabled and gives good results on functions doing integer arithmetic (60 times faster than CPython, i.e. within 20% of recoding the function in C and compiling with gcc without optimizations).
Since a Python interpreter is a rather large and intricate thing, our tool suite has become quite advanced to support it. This resulted in people having the idea of using it to implement interpreters for other dynamic languages than Python and get a lot of things for free (translation to various languages, stackless features, garbage collection, implementation of various things like arbitrarily long integers). Therefore people started to implement a JavaScript interpreter (Leonardo Santagada as his Summer of PyPy project) and a Prolog interpreter (Carl Friedrich Bolz as his Masters thesis). The JavaScript interpreter is undocumented at the moment, you can look at its sources. Both projects are unfinished the moment (the Prolog interpreter being less unfinished).
Sure you can come to sprints! We always welcome newcomers and try to help them get started in the project as much as possible (e.g. by providing tutorials and pairing them with experienced PyPy developers). Newcomers should have some Python experience and read some of the PyPy documentation before coming to a sprint.
Coming to a sprint is usually also the best way to get into PyPy development. If you want to start on your own, take a look at the list of project suggestions. If you get stuck or need advice, contact us. Usually IRC is the most immediate way to get feedback (at least during some parts of the day; many PyPy developers are in Europe) and the mailing list is better for long discussions.
It seems that a lot of strange, unexplainable problems can be magically solved by removing all the *.pyc files from the PyPy source tree (the script py.cleanup from py/bin will do that for you). Another thing you can do is removing the directory pypy/_cache completely. If the error is persistent and still annoys you after this treatment please send us a bug report (or even better, a fix :-)
No, PyPy is not a Python compiler.
In Python, it is mostly impossible to prove anything about the types that a program will manipulate by doing a static analysis. It should be clear if you are familiar with Python, but if in doubt see [BRETT].
What could be attempted is static "soft typing", where you would use a whole bunch of heuristics to guess what types are probably going to show up where. In this way, you could compile the program into two copies of itself: a "fast" version and a "slow" version. The former would contain many guards that allow it to fall back to the latter if needed. That would be a wholly different project than PyPy, though.
What PyPy contains is, on the one hand, an non-soft static type inferencer for RPython, which is a sublanguage that we defined just so that it's possible and not too hard to do that; and on the other hand, for the full Python language, we have an interpreter, and a JIT generator which can produce a Just-In-Time Compiler from the interpreter. The resulting JIT works for the full Python language in a way that doesn't need type inference at all.
[BRETT] | Brett Cannon, Localized Type Inference of Atomic Types in Python, http://www.ocf.berkeley.edu/~bac/thesis.pdf |
RPython is a restricted subset of the Python language. The restrictions are to ensure that type inference (and so, ultimately, translation to other languages) of the program is possible. These restrictions only apply after the full import happens, so at import time arbitrary Python code can be executed. Another important point is that the property of "being RPython" always applies to a full program, not to single functions or modules (the translation tool chain does a full program analysis).
The restrictions that apply to programs to be RPython mostly limit the ability of mixing types in arbitrary ways. RPython does not allow the usage of two different types in the same variable. In this respect (and in some others) it feels a bit like Java. Other features not allowed in RPython are the usage of special methods (__xxx__) except __init__ and __del__, and the usage of reflection capabilities (e.g. __dict__).
Most existing standard library modules are not RPython, except for some functions in os, math and time that are natively supported. In general it is quite unlikely that an existing Python program is by chance RPython; it is most likely that it would have to be heavily rewritten. To read more about the RPython limitations read the RPython description.
"Full program" in the context of "being RPython" is all the code reachable from an "entry point" function. The translation toolchain follows all calls recursively and discovers what belongs to the program and what not.
If you put "NOT_RPYTHON" into the docstring of a function and that function is found while trying to translate an RPython program, the translation process stops and reports this as an error. You can therefore mark functions as "NOT_RPYTHON" to make sure that they are never analyzed.
It's not necessarily nonsense, but it's not really The PyPy Way. It's pretty hard, without some kind of type inference, to translate, say this Python:
a + b
into anything significantly more efficient than this Common Lisp:
(py:add a b)
And making type inference possible is what RPython is all about.
You could make #'py:add a generic function and see if a given CLOS implementation is fast enough to give a useful speed (but I think the coercion rules would probably drive you insane first). -- mwh
No. PyPy always runs your code in its own interpreter, which is a full and compliant Python 2.4 interpreter. RPython is only the language in which parts of PyPy itself are written and extension modules for it. The answer to whether something needs to be written as an extension module, apart from the "gluing to external libraries" reason, will change over time as speed for normal Python code improves.
Backends that can actually translate all of PyPy:
Somewhat mature backends:
Partially implemented backends (both high-level):
To learn more about backends take a look at the translation document.
See the getting-started guide.
Start from the example of pypy/translator/goal/targetnopstandalone.py, which you compile by typing:
python translate.py targetnopstandalone
You can have a look at intermediate C source code, which is (at the moment) put in /tmp/usession-*/testing_1/testing_1.c. Of course, all the functions and stuff used directly and indirectly by your entry_point() function has to be RPython. Another example you may want to look at is pypy/translator/goal/targetprologstandalone.py, the target for the in-progress Prolog implementation; this target for example enables a stackless build programmatically.