PyPy allows experimentation in many directions -- indeed facilitating experimentation in language implementation was one of the main motivations for the project. This page is meant to collect some ideas of experiments that the core developers have not had time to perform yet and also do not require too much in depth knowledge to get started with.
Feel free to suggest new ideas and discuss them in #pypy on the freenode IRC network or the pypy-dev mailing list (see the contact page).
Although PyPy's Python interpreter is very compatible with CPython, it is not yet as fast. There are several approaches to making it faster, including the on-going Just-In-Time compilation efforts and improving the compilation tool chain, but probably the most suited to being divided into reasonably sized chunks is to play with alternate implementations of key data structures or algorithms used by the interpreter. PyPy's structure is designed to make this straightforward, so it is easy to provide a different implementation of, say, dictionaries or lists without disturbing any other code.
As examples, we've got working implementations of things like:
Things we've thought about but not yet implemented include:
Experiments of this kind are really experiments in the sense that we do not know whether they will work well or not and the only way to find out is to try. A project of this nature should provide benchmark results (both timing and memory usage) as much as code.
Some ideas on concrete steps for benchmarking:
PyPy has complete, or nearly so, back-ends for C, LLVM, CLI/.NET and JVM and partial backends for JavaScript, Common Lisp, Squeak. It would be an interesting project to improve either of these partial backends, or start one for another platform (Objective C comes to mind as one that should not be too terribly hard).
The JavaScript backend is somehow different from other pypy's backends because it does not try to support all of PyPy (where it might be run then?), but rather to compile RPython programs into code that runs in a browser. Some documents are in what is PyPy.js file and using the JavaScript backend. Some project ideas might be:
PyPy's Just-In-Time compiler relies on two assembler backends for actual code generation, one for PowerPC and the other for i386. Idea would be start a new backend for ie. mobile device.
Another idea in a similar vein would be to use LLVM to re-compile functions that are executed particularly frequently (LLVM cannot be used for all code generation, since it can only work on function at a time).
Write an interpreter for another dynamic language in the PyPy framework. For example, a Scheme interpreter would be suitable (and it would even be interesting from a semi-academic point of view to see if call/cc can be implemented on top of the primitives the stackless transform provides). Ruby too (though the latter is probably more than two months of work), or Lua, or ...
We already have a somewhat usable Prolog interpreter and the beginnings of a JavaScript interpreter.
Revive rexec: implement security checks, sandboxing, or some similar model within PyPy (which, if I may venture an opinion, makes more sense and is more robust than trying to do it in CPython).
There are multiple approaches that can be discussed and tried. One of them is about safely executing limited snippets of untrusted RPython code (see http://codespeak.net/pipermail/pypy-dev/2006q2/003131.html). More general approaches, to execute general but untrusted Python code on top of PyPy, require more design. The object space model of PyPy will easily allow objects to be tagged and tracked. The translation of PyPy would also be a good place to insert e.g. systematic checks around all system calls.
One of the advantages of PyPy's implementation is that the Python-level type of an object and its implementation are completely independent. This should allow a much more intuitive interface to, for example, objects that are backed by a persistent store.
The transparent proxy objects are a key step in this direction; now all that remains is to implement the interesting bits :-)
An example project might be to implement functionality akin to the ZODB's Persistent class, without the need for the _p_changed hacks, and in pure Python code (should be relatively easy on top of transparent proxy).
At the EuroPython sprint, some work was done on making RPython's annotator recognise Numeric arrays, with the goal of allowing programs using them to be efficiently translated. It would be a reasonably sized project to finish this work, i.e. allow RPython programs to use some Numeric facilities. Additionally, these facilities could be exposed to applications interpreted by the translated PyPy interpreter.
Rewrite one or several CPython extension modules to be based on ctypes (integrated in Python 2.5): this is generally useful for Python developers, and it is now the best path to write extension modules that are compatible with both CPython and PyPy. This is done with the extension compiler component of PyPy, which will likely require some attention as well.
Modules where some work is already done:
You are free to pick any other CPython module, either standard or third-party (if relatively well-known, like gtk bindings). Note that some modules exist in a ctypes version already, which would be a good start for porting them to PyPy's extension compiler.
...or whatever else interests you!
Feel free to mention your interest and discuss these ideas on the pypy-dev mailing list. You can also have a look around our documentation.