PyPy
PyPy[thoughts_string_interning]

String Interning in PyPy

A few thoughts about string interning. CPython gets a remarkable speed-up by interning strings. Interned are all builtin string objects and all strings used as names. The effect is that when a string lookup is done during instance attribute access, the dict lookup method will find the string always by identity, saving the need to do a string comparison.

Interned Srings in CPython

CPython keeps an internal dictionary named interned for all of these strings. It contains the string both as key and as value, which means there are two extra references in principle. Upto Version 2.2, interned strings were considered immortal. Once they entered the interned dict, nothing could revert this memory usage.

Starting with Python 2.3, interned strings became mortal by default. The reason was less memory usage for strings that have no external reference any longer. This seems to be a worthwhile enhancement. Interned strings that are really needed always have a real reference. Strings which are interned for temporary reasons get a big speed up and can be freed after they are no longer in use.

This was implemented by making the interned dictionary a weak dict, by lowering the refcount of interned strings by 2. The string deallocator got extra handling to look into the interned dict when a string is deallocated. This is supported by the state variable on string objects which tells whether the string is not interned, immortal or mortal.

Implementation problems for PyPy

  • The CPython implementation makes explicit use of the refcount to handle the weak-dict behavior of interned. PyPy does not expose the implementation of object aliveness. Special handling would be needed to simulate mortal behavior. A possible but expensive solution would be to use a real weak dictionary. Another way is to add a special interface to the backend that allows either the two extra references to be reset, or for the boehm collector to exclude the interned dict from reference tracking.
  • PyPy implements quite complete internal strings, as opposed to CPython which always uses its "applevel" strings. It also supports low-level dictionaries. This adds some complication to the issue of interning. Additionally, the interpreter currently handles attribute access by calling wrap(str) on the low-level attribute string when executing frames. This implies that we have to primarily intern low-level strings and cache the created string objects on top of them. A possible implementation would use a dict with ll string keys and the string objects as values. In order to save the extra dict lookup, we also could consider to cache the string object directly on a field of the rstr, which of course adds some extra cost. Alternatively, a fast id-indexed extra dictionary can provide the mapping from rstr to interned string object. But for efficiency reasons, it is anyway necessary to put an extra flag about interning on the strings. Flagging this by putting the string object itself as the flag might be acceptable. A dummyobject can be used if the interned rstr is not exposed as an interned string object.

Update: a reasonably simple implementation

Instead of the complications using the stringobject as a property of an rstr instance, I propose to special case this kind of dictionary (mapping rstr to stringobject) and to put an integer interned field into the rstr. The default is -1 for not interned. Non-negative values are the direct index of this string into the interning dict. That is, we grow an extra function that indexes the dict by slot number of the dict table and gives direct access to its value. The dictionary gets special handling on dict_resize, to recompute the slot numbers of the interned strings. ATM I'd say we leave the strings immortal and support mortality later when we have a cheap way to express this (less refcount, exclusion from Boehm, whatever).

A prototype brute-force patch

In order to get some idea how efficient string interning is at the moment, I implemented a quite crude version of interning. I patched space.wrap to call this intern_string instead of W_StringObject:

def intern_string(space, str):
    if we_are_translated():
        _intern_ids = W_StringObject._intern_ids
        str_id = id(str)
        w_ret = _intern_ids.get(str_id, None)
        if w_ret is not None:
            return w_ret
        _intern = W_StringObject._intern
        if str not in _intern:
            _intern[str] = W_StringObject(space, str)
        W_StringObject._intern_keep[str_id] = str
        _intern_ids[str_id] = w_ret = _intern[str]
        return w_ret
    else:
        return W_StringObject(space, str)

This is no general solution at all, since it a) does not provide interning of rstr and b) interns every app-level string. The implementation is also by far not as efficient as it could be, because it utilizes an extra dict _intern_ids which maps the id of the rstr to the string object, and a dict _intern_keep to keep these ids alive.

With just a single _intern dict from rstr to string object, the overall performance degraded slightly instead of an advantage. The triple dict patch accelerates richards by about 12 percent. Since it still has the overhead of handling the extra dicts, I guess we can expect twice the acceleration if we add proper interning support.

The resulting estimated 24 % acceleration is still not enough to justify an implementation right now.

Here the results of the richards benchmark:

D:\pypy\dist\pypy\translator\goal>pypy-c-17516.exe -c "from richards import *;Richards.iterations=1;main()"
debug: entry point starting
debug:  argv -> pypy-c-17516.exe
debug:  argv -> -c
debug:  argv -> from richards import *;Richards.iterations=1;main()
Richards benchmark (Python) starting... [<function entry_point at 0xeae060>]
finished.
Total time for 1 iterations: 38 secs
Average time for iterations: 38885 ms

D:\pypy\dist\pypy\translator\goal>pypy-c.exe -c "from richards import *;Richards.iterations=1;main()"
debug: entry point starting
debug:  argv -> pypy-c.exe
debug:  argv -> -c
debug:  argv -> from richards import *;Richards.iterations=1;main()
Richards benchmark (Python) starting... [<function entry_point at 0xead810>]
finished.
Total time for 1 iterations: 34 secs
Average time for iterations: 34388 ms

D:\pypy\dist\pypy\translator\goal>

This was just an exercize to get an idea. For sure this is not to be checked in. Instead, I'm attaching the simple patch here for reference.

Index: objspace/std/objspace.py
===================================================================
--- objspace/std/objspace.py  (revision 17526)
+++ objspace/std/objspace.py  (working copy)
@@ -243,6 +243,9 @@
                 return self.newbool(x)
             return W_IntObject(self, x)
         if isinstance(x, str):
+            # XXX quick speed testing hack
+            from pypy.objspace.std.stringobject import intern_string
+            return intern_string(self, x)
             return W_StringObject(self, x)
         if isinstance(x, unicode):
             return W_UnicodeObject(self, [unichr(ord(u)) for u in x]) # xxx
Index: objspace/std/stringobject.py
===================================================================
--- objspace/std/stringobject.py      (revision 17526)
+++ objspace/std/stringobject.py      (working copy)
@@ -18,6 +18,10 @@
 class W_StringObject(W_Object):
     from pypy.objspace.std.stringtype import str_typedef as typedef

+    _intern_ids = {}
+    _intern_keep = {}
+    _intern = {}
+
     def __init__(w_self, space, str):
         W_Object.__init__(w_self, space)
         w_self._value = str
@@ -32,6 +36,21 @@

 registerimplementation(W_StringObject)

+def intern_string(space, str):
+    if we_are_translated():
+        _intern_ids = W_StringObject._intern_ids
+        str_id = id(str)
+        w_ret = _intern_ids.get(str_id, None)
+        if w_ret is not None:
+            return w_ret
+        _intern = W_StringObject._intern
+        if str not in _intern:
+            _intern[str] = W_StringObject(space, str)
+        W_StringObject._intern_keep[str_id] = str
+        _intern_ids[str_id] = w_ret = _intern[str]
+        return w_ret
+    else:
+        return W_StringObject(space, str)

 def _isspace(ch):
     return ord(ch) in (9, 10, 11, 12, 13, 32)
Index: objspace/std/stringtype.py
===================================================================
--- objspace/std/stringtype.py        (revision 17526)
+++ objspace/std/stringtype.py        (working copy)
@@ -47,6 +47,10 @@
     if space.is_true(space.is_(w_stringtype, space.w_str)):
         return w_obj  # XXX might be reworked when space.str() typechecks
     value = space.str_w(w_obj)
+    # XXX quick hack to check interning effect
+    w_obj = W_StringObject._intern.get(value, None)
+    if w_obj is not None:
+        return w_obj
     w_obj = space.allocate_instance(W_StringObject, w_stringtype)
     W_StringObject.__init__(w_obj, space, value)
     return w_obj

ciao - chris