PyPy
PyPy[getting-started]

PyPy - Getting Started

Contents

1   What is PyPy ?

PyPy is an implementation of the Python programming language written in Python itself, flexible and easy to experiment with. Our long-term goals are to target a large variety of platforms, small and large, by providing a compiler toolsuite that can produce custom Python versions. Platform, memory and threading models are to become aspects of the translation process - as opposed to encoding low level details into the language implementation itself. Eventually, dynamic optimization techniques - implemented as another translation aspect - should become robust against language changes. more...

2   Just the facts

2.1   Downloading & running the PyPy 1.0 release

Download one of the following release files:

pypy-1.0

After unpacking the source downloads you can change to the pypy-1.0.0 directory and execute the following command line:

python pypy/bin/py.py

This will give you a PyPy prompt, i.e. a very compliant Python interpreter implemented in Python. PyPy passes around 98% of CPythons core language regression tests. Because this invocation of PyPy still runs on top of CPython, it runs around 2000 times slower than the original CPython.

However, since the 0.7.0 release it is possible to use PyPy to translate itself to lower level languages after which it runs standalone, is not dependant on CPython anymore and becomes faster.

If you are using the precompiled Windows executables, please look at the included README.txt on how to start already translated interpreters. Note that the examples in the html documentation generally assume that you have a py.py; with precompiled binaries, you need to pick one with the matching features compiled in.

2.2   Svn-check out & run the latest PyPy as a two-liner

If you want to play with the stable development PyPy version you can check it out from the repository using subversion. Download and install subversion if you don't already have it. Then you can issue on the command line (DOS box or terminal):

svn co http://codespeak.net/svn/pypy/dist pypy-dist

This will create a directory named pypy-dist, and will get you the PyPy source in pypy-dist/pypy and documentation files in pypy-dist/pypy/doc.

After checkout you can get a PyPy interpreter via:

python pypy-dist/pypy/bin/py.py

have fun :-)

We have some help on installing subversion for PyPy. Have a look at interesting starting points for some guidance on how to continue.

2.3   Understanding PyPy's architecture

For in-depth information about architecture and coding documentation head over to the documentation section where you'll find lots of interesting information. Additionally, in true hacker spirit, you may just start reading sources .

2.4   Running all of PyPy's tests

If you want to see if PyPy works on your machine/platform you can simply run PyPy's large test suite with:

cd pypy
python test_all.py directory-or-files

test_all.py is just another name for py.test which is the testing tool that we are using and enhancing for PyPy. Note that running all the tests takes a very long time, and enormous amounts of memory if you are trying to run them all in the same process; test_all.py is only suitable to run a subset of them at a time. To run them all we have an autotest driver that executes the tests directory by directory and produces pages like the following one:

http://wyvern.cs.uni-duesseldorf.de/pypytest/summary.html

2.5   Filing bugs or feature requests

You may file bug reports on our issue tracker which is also accessible through the 'issues' top menu of the PyPy website. using the development tracker has more detailed information on specific features of the tracker.

3   Interesting Starting Points in PyPy

The following assumes that you have successfully downloaded and extracted the PyPy release or have checked out PyPy using svn. It assumes that you are in the top level directory of the PyPy source tree, e.g. pypy-x.x (if you got a release) or pypy-dist (if you checked out the most recent version using subversion).

3.1   Main entry point

3.1.1   The py.py interpreter

To start interpreting Python with PyPy, use Python 2.3 or greater:

cd pypy
python bin/py.py

After a few seconds (remember: this is running on top of CPython), you should be at the PyPy prompt, which is the same as the Python prompt, but with an extra ">".

Now you are ready to start running Python code. Most Python modules should work if they don't involve CPython extension modules. Here is an example of determining PyPy's performance in pystones:

>>>> from test import pystone
>>>> pystone.main(10)

The parameter is the number of loops to run through the test. The default is 50000, which is far too many to run in a non-translated PyPy version (i.e. when PyPy's interpreter itself is being interpreted by CPython).

3.1.2   py.py options

To list the PyPy interpreter command line options, type:

cd pypy
python bin/py.py --help

py.py supports most of the options that CPython supports too (in addition to a large amount of options that can be used to customize py.py). As an example of using PyPy from the command line, you could type:

python py.py -c "from test import pystone; pystone.main(10)"

Alternatively, as with regular Python, you can simply give a script name on the command line:

python py.py ../../lib-python/2.4.1/test/pystone.py 10

See our configuration sections for details about what all the commandline options do.

3.2   Special PyPy features

3.2.1   Interpreter-level console

There are quite a few extra features of the PyPy console: If you press <Ctrl-C> on the console you enter the interpreter-level console, a usual CPython console. You can then access internal objects of PyPy (e.g. the object space) and any variables you have created on the PyPy prompt with the prefix w_:

>>>> a = 123
>>>> <Ctrl-C>
*** Entering interpreter-level console ***
>>> w_a
W_IntObject(123)

Note that the prompt of the interpreter-level console is only '>>>' since it runs on CPython level. If you want to return to PyPy, press <Ctrl-D> (under Linux) or <Ctrl-Z>, <Enter> (under Windows).

You may be interested in reading more about the distinction between interpreter-level and app-level.

3.2.2   Tracing bytecode and operations on objects

You can use the trace object space to monitor the interpretation of bytecodes in connection with object space operations. To enable it, set __pytrace__=1 on the interactive PyPy console:

>>>> __pytrace__ = 1
Tracing enabled
>>>> a = 1 + 2
|- <<<< enter <inline>a = 1 + 2 @ 1 >>>>
|- 0    LOAD_CONST    0 (W_IntObject(1))
|- 3    LOAD_CONST    1 (W_IntObject(2))
|- 6    BINARY_ADD
  |-    add(W_IntObject(1), W_IntObject(2))   -> W_IntObject(3)
|- 7    STORE_NAME    0 (a)
  |-    hash(W_StringObject('a'))   -> W_IntObject(-468864544)
  |-    int_w(W_IntObject(-468864544))   -> -468864544
|-10    LOAD_CONST    2 (<W_NoneObject()>)
|-13    RETURN_VALUE
|- <<<< leave <inline>a = 1 + 2 @ 1 >>>>

3.2.3   Lazily computed objects

One of the original features provided by PyPy is the "thunk" object space, providing lazily-computed objects in a fully transparent manner:

cd pypy
python bin/py.py -o thunk

>>>> from __pypy__ import thunk
>>>> def longcomputation(lst):
....     print "computing..."
....     return sum(lst)
....
>>>> x = thunk(longcomputation, range(5))
>>>> y = thunk(longcomputation, range(10))

From the application perspective, x and y represent exactly the objects being returned by the longcomputation() invocations. You can put these objects into a dictionary without triggering the computation:

>>>> d = {5: x, 10: y}
>>>> result = d[5]
>>>> result
computing...
10
>>>> type(d[10])
computing...
<type 'int'>
>>>> d[10]
45

It is interesting to note that this lazy-computing Python extension is solely implemented in a small objspace/thunk.py file consisting of around 200 lines of code. Since the 0.8.0 release you can translate PyPy with the thunk object space.

3.2.4   Logic programming

People familiar with logic programming languages will be interested to know that PyPy optionally supports logic variables and constraint-based programming. Among the many interesting features of logic programming -- like unification -- this subsumes the thunk object space by providing a more extensive way to deal with laziness.

Try it out:

cd pypy
python bin/py.py -o logic

>>>> X = newvar()         # a logic variable
>>>> bind(X, 42)          # give it a value
>>>> assert X / 2 == 21   # then use it
>>>> assert type(X) is int

>>>> X, Y, Z = newvar(), newvar(), newvar()  # three logic vars
>>>> unify({'hello': Y, 'world': Z}, X)      # a complex unification
>>>> bind(Y, 5)                              # then give values to Y
>>>> bind(Z, 7)                              # ... and Z
>>>> X
{'hello': 5, 'world': 7}

>>>> bind(Z, 8)
RuntimeError: Cannot bind twice

Read more about Logic Object space features.

3.2.5   Aspect Oriented Programming

PyPy provides an experimental Aspect Oriented Programming facility in the 'aop' module. Be aware that this module works on the abstract syntax tree level, and will leave traces in the byte-compiled ('.pyc') files. Be sure to remove these files after playing with the module if you do not intend to really use the aspect oriented programming in your code. Even better, use the '--no-objspace-usepycfiles' command line option, so that the byte-compiled modules are not written to disk.

Here is an example:

cd pypy
python bin/py.py --no-objspace-usepycfiles

>>>> from aop import Aspect, before, PointCut
>>>> class DemoAspect:
....     __metaclass__ = Aspect
....     @before(PointCut(func="^open$").call())
....     def before_open(self, joinpoint):
....         print "opening", joinpoint.arguments()
....
>>>> aspect = DemoAspect()
>>>> f = open("/tmp/toto.txt", 'w')

To read more about this, try the aop module documentation.

3.2.6   Running the tests

The PyPy project uses test-driven-development. Right now, there are a couple of different categories of tests which you can run. To run all the unit tests:

cd pypy
python test_all.py

(this is not recommended, since it takes hours and uses huge amounts of RAM). Alternatively, you may run subtests by going to the correct subdirectory and running them individually:

python test_all.py interpreter/test/test_pyframe.py

test_all.py is actually just a synonym for py.test which is our external testing tool. If you have installed that you can as well just issue py.test DIRECTORY_OR_FILE in order to perform test runs or simply start it without arguments to run all tests below the current directory.

Finally, there are the CPython regression tests which you can run like this (this will take hours and hours and hours):

cd lib-python/2.4.1/test
python ../../../pypy/test_all.py

or if you have installed py.test then you simply say:

py.test

from the lib-python/2.4.1/test directory. You need to have a checkout of the testresult directory. Running one of the above commands tells you how to proceed.

3.2.7   Demos

The demo/ directory contains examples of various aspects of PyPy, ranging from running regular Python programs (that we used as compliance goals) over experimental distribution mechanisms to examples translating sufficiently static programs into low level code.

3.3   Trying out the translator

The translator is a tool based on the PyPy interpreter which can translate sufficiently static Python programs into low-level code. To be able to use it you need to:

  • Download and install Pygame and ctypes if you do not already have them.
  • Have an internet connection. The flowgraph viewer connects to codespeak.net and lets it convert the flowgraph by a patched version of Dot Graphviz that does not crash. This is only needed if you want to look at the flowgraphs.
  • Use Python-2.4 for using translation tools because python2.5 is (as of revision 39130) not fully supported.

To start the interactive translator shell do:

cd pypy
python bin/translatorshell.py

Test snippets of translatable code are provided in the file pypy/translator/test/snippet.py, which is imported under the name snippet. For example:

>>> t = Translation(snippet.is_perfect_number)
>>> t.view()

After that, the graph viewer pops up, that lets you interactively inspect the flow graph. To move around, click on something that you want to inspect. To get help about how to use it, press 'H'. To close it again, press 'Q'.

3.3.1   Trying out the type annotator

We have a type annotator that can completely infer types for functions like is_perfect_number (as well as for much larger examples):

>>> t.annotate([int])
>>> t.view()

Move the mouse over variable names (in red) to see their inferred types.

3.3.2   Translating the flow graph to C code

The graph can be turned into C code:

>>> t.rtype()
>>> f = t.compile_c()

The first command replaces the operations with other low level versions that only use low level types that are available in C (e.g. int). To try out the compiled version:

>>> f(5)
False
>>> f(6)
True

3.3.3   Translating the flow graph to LLVM code

The LLVM or low level virtual machine project has, among other things, defined a statically typed portable assembly language and a set of tools that optimize and compile this assembly for a variety of platforms. As such, this assembly is a natural target for PyPy's translator.

To translate to LLVM assembly you must first have LLVM version 1.9 installed - the how to install LLVM page provides some helpful hints.

The LLVM backend is not as flexible as the C backend, and for example only supports one garbage collection strategy. Calling compiled LLVM code from CPython is more restrictive than the C backend - the return type and the arguments of the entry function must be ints, floats or bools - as the emphasis of the LLVM backend is to compile standalone executables.

Here is a simple example to try:

>>> t = Translation(snippet.my_gcd)
>>> a = t.annotate([int, int])
>>> t.rtype()
>>> f = t.compile_llvm()
>>> f(15, 10)
5

3.3.4   Translating the flow graph to JavaScript code

The JavaScript backend is still experimental but was heavily improved during last years Google summer of code. It contains some rudimentary support for the document object model and a good integration with PyPy's unittesting framework. Code can be tested with the Spidermonkey commandline JavaScript interpreter in addition to a multitude of JavaScript capable browsers. The emphasis of the JavaScript backend is to compile RPython code into JavaScript snippets that can be used in a range of browsers. The goal is to make it more and more capable to produce full featured web applications. Please see the pypy/translator/js/test directory for example unittests.

Here is a simple example to try:

>>> t = Translation(snippet.my_gcd)
>>> a = t.annotate([int, int])
>>> source = t.source_js()

If you want to know more about the JavaScript backend please refer to the JavaScript docs.

3.3.5   Translating the flow graph to CLI code

Use the CLI backend to translate the flow graphs into .NET executables: gencli is quite mature now and can also compile the whole interpreter. You can try out the CLI backend from the interactive translator shell:

>>> def myfunc(a, b): return a+b
...
>>> t = Translation(myfunc)
>>> t.annotate([int, int])
>>> f = t.compile_cli()
>>> f(4, 5)
9

The object returned by compile_cli is a wrapper around the real executable: the parameters are passed as command line arguments, and the returned value is read from the standard output.

Once you have compiled the snippet, you can also try to launch the executable directly from the shell; you can find the executable in one of the /tmp/usession-* directories:

$ mono /tmp/usession-<username>/main.exe 4 5
9

To translate and run for CLI you must have the SDK installed: Windows users need the .NET Framework SDK 2.0, while Linux and Mac users can use Mono.

3.3.6   A slightly larger example

There is a small-to-medium demo showing the translator and the annotator:

cd demo
python bpnn.py

This causes bpnn.py to display itself as a call graph and class hierarchy. Clicking on functions shows the flow graph of the particular function. Clicking on a class shows the attributes of its instances. All this information (call graph, local variables' types, attributes of instances) is computed by the annotator.

As soon as you close the PyGame window, the function is turned into C code, compiled and executed.

3.4   Translating the PyPy Python interpreter

(Note: for some hints on how to translate the Python interpreter under Windows, see the windows document)

Not for the faint of heart nor the owner of a very old machine: you can translate the PyPy's whole of Python interpreter to low level C code. This is the largest and ultimate example of source that our translation toolchain can process:

cd pypy/translator/goal
python translate.py --run targetpypystandalone.py

By default the translation process will try to use the Boehm-Demers-Weiser garbage collector for the translated interpreter (Use --gc=framework to use our own exact mark-n-sweep implementation which at the moment is slower but doesn't have external dependencies). Otherwise, be sure to install Boehm before starting the translation (e.g. by running apt-get install libgc-dev on Debian).

This whole process will take some time and quite a lot of memory (although it might just work on a machine with 512MB of RAM). If the translation is finished running and after you closed the graph you will be greeted (because of --run option) by the friendly prompt of a PyPy executable that is not running on top of CPython any more:

[translation:info] created: ./pypy-c
[translation:info] Running compiled c source...
debug: entry point starting
debug:  argv -> ./pypy-c
debug: importing code
debug: calling code.interact()
Python 2.4.1 (pypy 0.7.1 build 18929) on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>>> 1 + 1
2
>>>>

With the default options, you can find the produced executable under the name pypy-c. Type pypy-c --help to see the options it supports -- mainly the same basic options as CPython. In addition, pypy-c --info prints the translation options that where used to produce this particular executable. This executable contains a lot of things that are hard-coded for your particular system (including paths), so it's not really meant to be installed or redistributed at the moment.

If you exit the interpreter you get a pygame window with all the flow graphs plus a pdb prompt. Moving around in the resulting flow graph is difficult because of the sheer size of the result. For this reason, the debugger prompt you get at the end has been enhanced with commands to facilitate locating functions and classes. Type help graphs for a list of the new commands. Help is also available on each of these new commands.

The translate.py script itself takes a number of options controlling what to translate and how. See translate.py -h. Some of the more interesting options are:

  • --text: don't show the flow graph after the translation is done. This is useful if you don't have pygame installed.
  • --stackless: this produces a pypy-c that includes features inspired by Stackless Python.
  • --gc=boehm|ref|framework|stacklessgc: choose between using the Boehm-Demers-Weiser garbage collector, our reference counting implementation or our own implementation of a mark and sweep collector, with two different approaches for finding roots (as we have seen Boehm's collector is the default).

Find a more detailed description of the various options in our configuration sections.

You can also use the translate.py script to try out several smaller programs, e.g. a slightly changed version of Pystone:

cd pypy/translator/goal
python translate.py targetrpystonedalone

This will produce the executable "targetrpystonedalone-c".

3.4.1   Translating with the thunk object space

It is also possible to experimentally translate a PyPy version using the "thunk" object space:

cd pypy/translator/goal
python translate.py targetpypystandalone.py --objspace=thunk

the examples in lazily computed objects should work in the translated result.

3.4.2   Translating using the LLVM backend

To create a standalone executable using the experimental LLVM compiler infrastructure:

./translate.py --text --batch --backend=llvm targetpypystandalone.py

3.4.3   Translating using the CLI backend

To create a standalone .NET executable using the CLI backend:

./translate.py --text --batch --backend=cli targetpypystandalone.py

The executable and all its dependecies will be stored in the ./pypy-cli-data directory. To run pypy.NET, you can run ./pypy-cli-data/main.exe. If you are using Linux or Mac, you can use the convenience ./pypy-cli script:

$ ./pypy-cli
debug: entry point starting
debug:  argv ->
debug: importing code
debug: calling code.interact()
Python 2.4.1 (pypy 0.9.0 build 38134) on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
 >>>> 1 + 1
 2
 >>>>

Unfortunately the interactive interpreter will not work with Mono versions older than 1.1.17.2 due to a Mono bug. Everything else will work fine, but not the interactive shell.

Moreover, at the moment it's not possible to do the full translation using only the tools provided by the Microsoft .NET SDK, since ilasm crashes when trying to assemble the pypy-cli code due to its size. Microsoft .NET SDK 2.0.50727.42 is affected by this bug; other version could be affected as well: if you find a version of the SDK that works, please tell us.

Windows users that want to compile their own pypy-cli can install Mono: if a Mono installation is detected the translation toolchain will automatically use its ilasm2 tool to assemble the executables.

3.4.3.1   Trying the experimental .NET integration

You can also try the still very experimental clr module that enables integration with the surrounding .NET environment.

You can dynamically load .NET classes using the clr.load_cli_class method. After a class has been loaded, you can instantiate and use it as it were a normal Python class. Special methods such as indexers and properties are supported using the usual Python syntax:

>>>> import clr
>>>> ArrayList = clr.load_cli_class('System.Collections', 'ArrayList')
>>>> obj = ArrayList()
>>>> obj.Add(1)
0
>>>> obj.Add(2)
1
>>>> obj.Add("foo")
2
>>>> print obj[0], obj[1], obj[2]
1 2 foo
>>>> print obj.Count
3

At the moment the only way to load a .NET class is to explicitly use clr.load_cli_class; in the future they will be automatically loaded when accessing .NET namespaces as they were Python modules, as IronPython does.

3.4.4   Translate using the Build Tool

If you don't have access to the required resources to build PyPy yourself, or want to build it on a platform you don't have access to, you can use the 'build tool', a set of components that allow translating and compiling PyPy on remote servers. The build tool provides a meta server that manages a set of connected build servers; clients can request a build by specifying target options and Subversion revision and path, and the meta server will do its best to get the request fulfilled. Currently you need to have a codespeak account to be able to use the build tool.

The meta server is made available on codespeak.net and there should always be a number of build servers running. If you want to request a compilation you need to checkout the build tool sources from:

https://codespeak.net/svn/pypy/build/buildtool

Then you can issue:

$ ./bin/startcompile.py [options] <email>

in the root of the 'buildtool' package replacing '[options]' with the options you desire and '<email>' with your email address. Type:

$ ./bin/startcompile.py --help

to see a list of available options.

To donate build server time (we'd be eternally thankful! ;) just run:

$ ./bin/buildserver.py

without arguments. Do mind that you need to be a registered user on 'codespeak.net', and that a decent machine with enough RAM is required (about 1-2 GB is recommended).

Note: currently Windows is not supported.

3.5   Where to start reading the sources

PyPy is made from parts that are relatively independent from each other. You should start looking at the part that attracts you most (all paths are relative to the PyPy top level directory). You may look at our directory reference or start off at one of the following points:

  • pypy/interpreter contains the bytecode interpreter: bytecode dispatcher in pyopcode.py, frame and code objects in eval.py and pyframe.py, function objects and argument passing in function.py and argument.py, the object space interface definition in baseobjspace.py, modules in module.py and mixedmodule.py. Core types supporting the bytecode interpreter are defined in typedef.py.
  • pypy/interpreter/pyparser contains a recursive descent parser, and input data files that allow it to parse both Python 2.3 and 2.4 syntax. Once the input data has been processed, the parser can be translated by the above machinery into efficient code.
  • pypy/interpreter/astcompiler contains the compiler. This contains a modified version of the compiler package from CPython that fixes some bugs and is translatable. That the compiler and parser are translatable is new in 0.8.0 and it makes using the resulting binary interactively much more pleasant.
  • pypy/objspace/std contains the Standard object space. The main file is objspace.py. For each type, the files xxxtype.py and xxxobject.py contain respectively the definition of the type and its (default) implementation.
  • pypy/objspace contains a few other object spaces: the thunk, trace and flow object spaces. The latter is a relatively short piece of code that builds the control flow graphs when the bytecode interpreter runs in it.
  • pypy/translator contains the code analysis and generation stuff. Start reading from translator.py, from which it should be easy to follow the pieces of code involved in the various translation phases.
  • pypy/annotation contains the data model for the type annotation that can be inferred about a graph. The graph "walker" that uses this is in pypy/annotation/annrpython.py.
  • pypy/rpython contains the code of the RPython typer. The typer transforms annotated flow graphs in a way that makes them very similar to C code so that they can be easy translated. The graph transformations are controlled by the stuff in pypy/rpython/rtyper.py. The object model that is used can be found in pypy/rpython/lltypesystem/lltype.py. For each RPython type there is a file rxxxx.py that contains the low level functions needed for this type.

3.6   Additional Tools for running (and hacking) PyPy

We use some optional tools for developing PyPy. They are not required to run the basic tests or to get an interactive PyPy prompt but they help to understand and debug PyPy especially for the ongoing translation work.

3.6.1   graphviz & pygame for flow graph viewing (highly recommended)

graphviz and pygame are both necessary if you want to look at generated flow graphs:

graphviz: http://www.graphviz.org/Download.php

pygame: http://www.pygame.org/download.shtml

3.6.2   CTypes (highly recommended)

ctypes (version 0.9.9.6 or later) is required if you want to translate PyPy to C. See the download page of ctypes.

3.6.3   CLISP

The CLISP backend is optional and not quite up-to-date with the rest of PyPy. Still there are a few examples you can try our backend out on. Here is a link to a LISP implementation that should basically work:

http://clisp.cons.org/

3.6.4   py.test and the py lib

The py library is used for supporting PyPy development and running our tests against code and documentation as well as compliance tests. You don't need to install the py library because it ships with PyPy and pypy/test_all.py is an alias for py.test but if you want to have the py.test tool generally in your path, you might like to visit:

http://codespeak.net/py/dist/download.html

4   Getting involved

PyPy employs an open development process. You are invited to join our pypy-dev mailing list or look at the other contact possibilities. We are also doing coding Sprints which are separately announced and often happen around Python conferences such as EuroPython or Pycon. Take a look at the list of upcoming events to plan where to meet with us.