PyPy
PyPy[project-ideas]

Independent project ideas relating to PyPy

PyPy allows experimentation in many directions -- indeed facilitating experimentation in language implementation was one of the main motivations for the project. This page is meant to collect some ideas of experiments that the core developers have not had time to perform yet and also do not require too much in depth knowledge to get started with.

Feel free to suggest new ideas and discuss them in #pypy on the freenode IRC network or the pypy-dev mailing list (see the contact page).


Experiment with optimizations

Although PyPy's Python interpreter is very compatible with CPython, it is not yet as fast. There are several approaches to making it faster, including the on-going Just-In-Time compilation efforts and improving the compilation tool chain, but probably the most suited to being divided into reasonably sized chunks is to play with alternate implementations of key data structures or algorithms used by the interpreter. PyPy's structure is designed to make this straightforward, so it is easy to provide a different implementation of, say, dictionaries or lists without disturbing any other code.

As examples, we've got working implementations of things like:

  • lazy string slices (slicing a string gives an object that references a part of the original string).
  • lazily concatenated strings (repeated additions and joins are done incrementally).
  • dictionaries specialized for string-only keys.
  • dictionaries which use a different strategy when very small.
  • caching the lookups of builtin names (by special forms of dictionaries that can invalidate the caches when they are written to)

Things we've thought about but not yet implemented include:

  • create multiple representations of Unicode string that store the character data in narrower arrays when they can.

Experiments of this kind are really experiments in the sense that we do not know whether they will work well or not and the only way to find out is to try. A project of this nature should provide benchmark results (both timing and memory usage) as much as code.

Some ideas on concrete steps for benchmarking:

  • find a set of real-world applications that can be used as benchmarks for pypy (ideas: docutils, http://hachoir.org/, moinmoin, ...?)
  • do benchmark runs to see how much speedup the currently written optimizations give
  • profile pypy-c and its variants with these benchmarks, identify slow areas
  • try to come up with optimized implementations for these slow areas

Start or improve a back-end

PyPy has complete, or nearly so, back-ends for C, LLVM, CLI/.NET and JVM and partial backends for JavaScript, Common Lisp, Squeak. It would be an interesting project to improve either of these partial backends, or start one for another platform (Objective C comes to mind as one that should not be too terribly hard).

Improve JavaScript backend

The JavaScript backend is somehow different from other pypy's backends because it does not try to support all of PyPy (where it might be run then?), but rather to compile RPython programs into code that runs in a browser. Some documents are in what is PyPy.js file and using the JavaScript backend. Some project ideas might be:

  • Write some examples to show different possibilities of using the backend.
  • Improve the facilities for testing RPython intended to be translated to JavaScript on top of CPython, mostly by improving existing DOM interface.
  • Write down the mochikit bindings (or any other interesting JS effects library), including tests
  • Write down better object layer over DOM in RPython to make writing applications easier

Improve one of the JIT back-ends

PyPy's Just-In-Time compiler relies on two assembler backends for actual code generation, one for PowerPC and the other for i386. Idea would be start a new backend for ie. mobile device.

Another idea in a similar vein would be to use LLVM to re-compile functions that are executed particularly frequently (LLVM cannot be used for all code generation, since it can only work on function at a time).

Write a new front end

Write an interpreter for another dynamic language in the PyPy framework. For example, a Scheme interpreter would be suitable (and it would even be interesting from a semi-academic point of view to see if call/cc can be implemented on top of the primitives the stackless transform provides). Ruby too (though the latter is probably more than two months of work), or Lua, or ...

We already have a somewhat usable Prolog interpreter and the beginnings of a JavaScript interpreter.

Investigate restricted execution models

Revive rexec: implement security checks, sandboxing, or some similar model within PyPy (which, if I may venture an opinion, makes more sense and is more robust than trying to do it in CPython).

There are multiple approaches that can be discussed and tried. One of them is about safely executing limited snippets of untrusted RPython code (see http://codespeak.net/pipermail/pypy-dev/2006q2/003131.html). More general approaches, to execute general but untrusted Python code on top of PyPy, require more design. The object space model of PyPy will easily allow objects to be tagged and tracked. The translation of PyPy would also be a good place to insert e.g. systematic checks around all system calls.

Experiment with distribution and persistence

One of the advantages of PyPy's implementation is that the Python-level type of an object and its implementation are completely independent. This should allow a much more intuitive interface to, for example, objects that are backed by a persistent store.

The transparent proxy objects are a key step in this direction; now all that remains is to implement the interesting bits :-)

An example project might be to implement functionality akin to the ZODB's Persistent class, without the need for the _p_changed hacks, and in pure Python code (should be relatively easy on top of transparent proxy).

Numeric/NumPy/numarray support

At the EuroPython sprint, some work was done on making RPython's annotator recognise Numeric arrays, with the goal of allowing programs using them to be efficiently translated. It would be a reasonably sized project to finish this work, i.e. allow RPython programs to use some Numeric facilities. Additionally, these facilities could be exposed to applications interpreted by the translated PyPy interpreter.

Extension modules

Rewrite one or several CPython extension modules to be based on ctypes (integrated in Python 2.5): this is generally useful for Python developers, and it is now the best path to write extension modules that are compatible with both CPython and PyPy. This is done with the extension compiler component of PyPy, which will likely require some attention as well.

Modules where some work is already done:

You are free to pick any other CPython module, either standard or third-party (if relatively well-known, like gtk bindings). Note that some modules exist in a ctypes version already, which would be a good start for porting them to PyPy's extension compiler.

Extend py.execnet to a peer-to-peer model

  • Work on a P2P model of distributed execution (extending py.execnet) that allows py.test and other upcoming utilities to make use of a network of computers executing python tasks (e.g. tests or PyPy build tasks).
  • Make a client tool and according libraries to instantiate a (dynamic) network of computers executing centrally managed tasks (e.g. build or test ones). (This may make use of a P2P model or not, both is likely feasible).

Or else...

...or whatever else interests you!

Feel free to mention your interest and discuss these ideas on the pypy-dev mailing list. You can also have a look around our documentation.