This document describes the PyPy extension compiler, which is able to compile a set of source code to a PyPy or a CPython extension module.
WARNING: this is beta software, APIs and details may change.
The documentation corresponds to release 0.99. The extension compiler has not been extensively tested and polished so far, so bugs and rough edges are likely.
The final version of the Technical Report D03.1 also describes the extension compiler's goals in more details and presents an overview of its implementation (http://codespeak.net/pypy/dist/pypy/doc/index-report.html).
In regular Python, the ability for users to write external modules is of great importance. These external modules must be written in C, which is both an advantage - it allows access to external C libraries, low-level features, or raw performance - and an inconvenience. In the context of PyPy the greatest drawback of hard-coded C external modules is that low-level details like multithreading (e.g. locks) and memory management must be explicitly written in such a language, which would prevent the same module from being used in several differently-compiled versions of pypy-c.
PyPy provides instead a generic way to write modules for all Python implementations: the so-called mixed module approach. A single mixed module is implemented as a set of Python source files, making a subpackage of the pypy/module/ package. While running PyPy, each of these subpackages appears to be a single module, whose interface is specified in the __init__.py of the subpackage, and whose implementation is lazily loaded from the various other .py files of the subpackage.
The subpackage is written in a way that allows it to be reused for non-PyPy implementations of Python. The goal of the Extension Compiler is to compile a mixed module into an extension module for these other Python implementation (so far, this means CPython only).
This means that you can do the following with a mixed module:
run it on top of CPython. This uses the CPy Object Space to directly run all space operations. Example:
$ python >>> from pypy.interpreter.mixedmodule import testmodule >>> demo = testmodule("_demo") [ignore output here] >>> demo.measuretime(1000000, long) 5
compile it as a CPython extension module. Example:
$ python pypy/bin/compilemodule.py _demo [lots of output] Created '/tmp/usession-5/_demo/_demo.so'. $ cd /tmp/usession-5/_demo $ python >>> import _demo >>> _demo.measuretime(10000000, long) 2
run it with PyPy on top of CPython. Example:
$ python pypy/bin/py.py --withmod_demo PyPy 0.99.0 in StdObjSpace on top of Python 2.4.3 >>>> import _demo >>>> _demo.measuretime(10000, long) [ignore output here] 4
compile it together with PyPy. It becomes a built-in module of pypy-c. Example:
$ cd pypy/translator/goal $ python translate.py targetpypystandalone --withmod-_demo [wait for 30 minutes] [translation:info] created: ./pypy-c (Pbd) $ ./pypy-c Python 2.4.1 (pypy 0.9.0 build xxxxx) on linux2 Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>>> import _demo >>>> _demo.measuretime(10000000, long) 2
You have to put your module into its own directory in pypy/module/ of your local pypy checkout. See the following directories as guidelines:
A demo module showing a few of the more interesting features.
A tiny, in-progress example giving bindings to the GNU readline library.
An algorithmic example: the regular expression engine of CPython, rewritten in RPython.
Modules can be based on ctypes. This is the case in the pypy/module/readline and pypy/module/_demo examples: they use ctypes to access functions in external C libraries. When translated to C, the calls in these examples become static, regular C function calls -- which means that most of the efficiency overhead of using ctypes disappears during the translation. However, some rules must be followed in order to make ctypes translatable; for more information, see the documentation of RCtypes.
All these modules are called "mixed" because they mix interpreter-level and application-level submodules to present a single, coherent module to the user. For a CPython extension module, you can think about interpreter-level as what will be compiled into C, and application-level as what will stay as it is, as Python code, included (as a frozen bytecode) within the C module. The capability to group several interpreter-level files into the final compiled module is similar to having a CPython extension module that was compiled from a set of C sources; but the capability to also include nicely integrated Python sources in the C extension module has no direct equivalent in hand-written C extension modules.
The directory structure is described in the Coding Guide, section mixed modules. The interpreter-level parts of a mixed module must follow the RPython programming style; this is necessary to allow them to be annotated and translated to C. Whenever they manipulate application-level (i.e. user-visible) objects, they must do so via object space operations (see wrapping rules).
The interpreter-level parts of a mixed module exports functions and types to app-level, to make them visible to the module user. From the user's point of view, they are built-in functions and built-in types.
Exporting functions is done via the interpleveldefs dictionary in __init__.py. The functions thus exported must either have a simple signature of the form (space, w_arg1, w_arg2, w_arg3...) -- in this case, it will be exposed to app-level as a built-in function taking the same number of arguments, and calling the built-in function from app-level will pass all the provided arguments as wrapped objects to the interp-level function. Alternatively, the interp-level function can have an unwrap_spec attribute that declares what type of arguments it expects. The unwrap_spec is a list of specifiers, one per argument; the specifiers can be one of the following strings or objects (to be imported from pypy/interpreter/gateway.py):
There are two ways to export types. The first is to write the type at app-level, as a regular class. You can then call interp-level functions from your own module to implement some of the methods, while keeping the class generally at app-level.
This does not work, however, if you need special data attached to the instances of your class. For this case, you need the second solution: write the class entirely at interp-level. Such a class must inherit from pypy.interpreter.baseobjspace.Wrappable. You can manipulate instances of such a class freely at interp-level. Instances of subclasses of Wrappable are not wrapped; they are merely wrappable. To expose them to app-level, call w_obj = space.wrap(obj) -- it wraps your instance obj into a box, which can be passed to app-level. To unwrap such a thing again, call obj = space.interp_w(YourSubclass, w_obj), which performs a type check and returns the unwrapped instance of YourSubclass. To avoid confusion, try to follow the usual naming convension: non-wrapped objects like instances of Wrappable go in variables whose name does not start with w_; wrapped objects go in variables whose name starts with w_.
Before you can wrap instances of subclasses of Wrappable, though, you need to attach a TypeDef to the subclass in question. The TypeDef describes the public app-level interface of your type. An example can be found in pypy/module/_sre/interp_sre.py. Some minimal documentation is provided here, but note that the extcompiler itself does not support special methods at the moment, i.e. methods with __xyz__() kind of names. You can only export regular methods (with interp2app()), properties (with GetSetProperty()), and simple class-level attributes like strings and integers. (This limitation will be lifted soon.)
The readline mixed module has a minimal example for each of 4 different kind of tests:
As this is a ctypes bindings module, we should test the ctypes bindings directly to see if they work as expected:
python pypy/test_all.py pypy/module/readline/test/test_c_readline.py
We should then test that the mixed module wrapping is correct, using the helper function pypy.interpreter.mixedmodule.testmodule(). Called from a normal CPython program, this function exposes a mixed module as a plain CPython module for testing and inspection (it works in the interactive interpreter too, of course: it is a good way to see how mixed module wrappings look like in the end):
python pypy/test_all.py pypy/module/readline/test/test_mixedmodule.py
We can also run the mixed module within PyPy, on top of CPython. To do so from a test, we use a special AppTestXyz class:
python pypy/test_all.py pypy/module/readline/test/test_with_pypy.py
Finally, we can compile the mixed module to a CPython extension module, re-import it into the running CPython interpreter, and test it. Only this test will pick up the translation failures caused by breaking the RPython rules. (To debug translation failures, though, you should use compilemodule.py as described below: you will then get a Pdb prompt and a flow graph viewer to look around.)
python pypy/test_all.py pypy/module/readline/test/test_compiler.py
As seen in the introduction, you translate a module into a CPython extension with the following command-line:
python pypy/bin/compilemodule.py _demo
The extension compiler imports the specified package from pypy/module/ and produces a shared library importable from your local Python installation. The produced shared library is put into a temporary directory printed at the end (which on Linux is also accessible as /tmp/usession-<username>/<modulename>/<modulename>.so).
Note that we recommend you to write and run tests for your module first. This is not only a matter of style: bogus modules are likely to make the translation tool-chain fail in mysterious ways.
See the introduction for other things you can do with a mixed module.
Note that you obviously need to have a full pypy checkout first. If you have troubles compiling the demo modules, check out our ctypes-specific installation notes.