PyPy
PyPy[howto-logicobjspace]

How to use the Logic Object space features of PyPy

Outline

This document gives some information about the content and usage of an extension of PyPy known as the Logic Objectspace (LO), and also about the constraint programming library that ships with PyPy. The LO, when finished, will provide additional builtins that will allow to write:

  • concurrent programs based on coroutines scheduled by dataflow logic variables,
  • concurrent logic programs,
  • concurrent constraint programs,
  • new search "engines" to help solve logic/constraint programs.

Currently, the integrated concurrent logic and constraint programming part is, unfortunately, quite unfinished. It will take some effort, time and knowledge of PyPy internals to finish it. We provide however a full-blown constraint-solving infrastructure that can be used (and extended) out of the box.

To fire up a working standard PyPy with the the constraint library, please type:

root-of-pypy-dist/pypy/bin/py.py --withmod-_cslib

To fire up a working PyPy with the LO (including the constraint solving library), please type:

root-of-pypy-dist/pypy/bin/py.py -o logic

More information is available in the EU Interim Report, especially with respect to the (unfinished) integrated framework for constraint and logic programming.

Logic Variables and Dataflow Synchronisation of Coroutines

This section peruses the LO, so you should try the examples with a logic build or the -o logic argument to py.py.

Logic Variables

Description and examples

Logic variables are (should be, in an ideal LO) similar to Python variables in the following sense: they map names to values in a defined scope. But unlike normal Python variables, they have two states: free and bound. A bound logic variable is indistinguishable from a normal Python value, which it wraps. A free variable can only be bound once (it is also said to be a single-assignment variable). It is good practice to denote these variables with a beginning capital letter, so as to avoid confusion with normal variables.

The following snippet creates a new logic variable and asserts its state:

X = newvar()
assert is_free(X)
assert not is_bound(X)

Logic variables can be bound like that:

bind(X, 42)
assert X / 2 == 21

The single-assignment property is easily checked:

bind(X, 'hello') # would raise a RebindingError
bind(X, 42)      # is admitted (it is a no-op)

It is quite obvious from this that logic variables are really objects acting as boxes for python values.

The bind operator is low-level. The more general operation that binds a logic variable is known as "unification". Unify is an operator that takes two arbitrary data structures and tries to assert their equality, much in the sense of the == operator, but with one important twist: unify mutates the state of the involved logic variables.

Unify is thus defined as follows (it is symmetric):

Unify value unbound var
value equal? bind
unbound var bind alias

Unifying structures devoid of logic variables, like:

unify([1, 2], [1, 2])
unify(42, 43) # raises UnificationError

A basic example involving logic variables embedded into dictionaries:

Z, W = newvar(), newvar()
unify({'a': 42, 'b': Z},
      {'a':  Z, 'b': W})
assert Z == W == 42

An example involving custom data types:

class Foo(object):
    def __init__(self, a):
        self.a = a
        self.b = newvar()

f1 = Foo(newvar())
f2 = Foo(42)
unify(f1, f2)
assert f1.a == f2.a == 42    # assert (a)
assert alias_of(f1.b, f2.b)  # assert (b)
unify(f2.b, 'foo')
assert f1.b == f2.b == 'foo' # (b) is entailed indeed

The operators table

Logic variables support the following operators (with their arity):

Predicates:

is_free/1
  any -> bool

is_bound/1
  any -> bool

alias_of/2
  logic vars. -> bool

Variable Creation:

newvar/0
  nothing -> logic variable

Mutators:

bind/2
  logic var., any -> None

unify/2
  any, any -> None

Coroutines and dataflow synchronisation

Description and examples

When a piece of code tries to access a free logic variable, the coroutine in which it runs is blocked (suspended) until the variable becomes bound. This behaviour is known as "dataflow synchronization" and mimics exactly the dataflow variables from the Oz programming language. With respect to behaviour under concurrency conditions, logic variables come with two operators :

  • wait: this suspends the current coroutine until the variable is bound, it returns the value otherwise (impl. note: in the logic object space, all operators make an implicit wait on their arguments)
  • wait_needed: this suspends the current coroutine until the variable has received a wait message. It has to be used explicitly, typically by a producer coroutine that wants to produce data only when needed.

In this context, binding a variable to a value will make runnable all coroutines blocked on this variable.

Wait and wait_needed allow to write efficient lazy evaluating code.

Using the "stacklet" builtin (which spawns a coroutine and applies the 2..n args to its first arg), here is how to implement a producer/consumer scheme:

from cclp import stacklet

def generate(n, limit, R):
    if n < limit:
        Tail = newvar()
        unify(R, (n, Tail))
        return generate(n + 1, limit, Tail)
    bind(R, None)
    return

def sum(L, a, R):
    Head, Tail = newvar(), newvar()
    unify(L, (Head, Tail))
    if Tail != None:
        return sum(Tail, Head + a, R)
    bind(R, a + Head)
    return

X = newvar()
S = newvar()
stacklet(sum, X, 0, S)
stacklet(generate, 0, 10, X)
assert S == 45

Note that this eagerly generates all elements before the first of them is consumed. Wait_needed helps us write a lazy version of the generator. But the consumer will be responsible of the termination, and thus must be adapted too:

def lgenerate(n, L):
    """lazy version of generate"""
    wait_needed(L)
    Tail = newvar()
    bind(L, (n, Tail))
    lgenerate(n+1, Tail)

def lsum(L, a, limit, R):
    """this summer controls the generator"""
    if limit > 0:
        Head, Tail = newvar(), newvar()
        wait(L) # awakes those waiting by need on L
        unify(L, (Head, Tail))
        return lsum(Tail, a+Head, limit-1, R)
    else:
        bind(R, a)

Y = newvar()
T = newvar()

stacklet(lgenerate, 0, Y)
stacklet(lsum, Y, 0, 10, T)
assert T == 45

The operators table

Blocking ops

wait/1 # blocks if first argument is a free logic var., till it becomes bound
value -> value
wait_needed/1 # blocks until its arg. receives a wait
logic var. -> logic var.

Coroutine spawning

stacklet/n
callable, (n-1) optional args -> None

Constraint Programming

PyPy comes with a flexible, extensible constraint solver engine based on the CPython Logilab constraint package (and we paid attention to API compatibility). We therein describe how to use the solver to specify and get the solutions of a constraint satisfaction problem.

Specification of a problem

A constraint satisfaction problem is defined by a triple (X, D, C) where X is a set of finite domain variables, D the set of domains associated with the variables in X, and C the set of constraints, or relations, that bind together the variables of X.

So we basically need a way to declare variables, their domains and relations; and something to hold these together.

Let's have a look at a reasonably simple example of a constraint program:

from cslib import *

variables = ('c01','c02','c03','c04','c05',
             'c06','c07','c08','c09','c10')
values = [(room,slot)
          for room in ('room A','room B','room C')
          for slot in ('day 1 AM','day 1 PM',
                       'day 2 AM','day 2 PM')]
domains = {}

# let us associate the variables to their domains
for v in variables:
    domains[v]=fd.FiniteDomain(values)

# let us define relations/constraints on the variables
constraints = []

# Internet access is in room C only
for conf in ('c03','c04','c05','c06'):
    constraints.append(fd.make_expression((conf,),
                                          "%s[0] == 'room C'"%conf))

# Speakers only available on day 1
for conf in ('c01','c05','c10'):
    constraints.append(fd.make_expression((conf,),
                                          "%s[1].startswith('day 1')"%conf))
# Speakers only available on day 2
for conf in ('c02','c03','c04','c09'):
    constraints.append(fd.make_expression((conf,),
                                          "%s[1].startswith('day 2')"%conf))

# try to satisfy people willing to attend several conferences
groups = (('c01','c02','c03','c10'),
          ('c02','c06','c08','c09'),
          ('c03','c05','c06','c07'),
          ('c01','c03','c07','c08'))
for g in groups:
    for conf1 in g:
        for conf2 in g:
            if conf2 > conf1:
                constraints.append(fd.make_expression((conf1,conf2),
                                                      '%s[1] != %s[1]'%\
                                                      (conf1,conf2)))

constraints.append(fd.AllDistinct(variables))

# now, give the triple (X, D, C) to a repository object
r = Repository(variables,domains,constraints)

# that we can give to one solver of our choice
# (there, it is the default depth-first search,
#  find-all solutions solver)

solutions = Solver().solve(r, 0)
assert len(solutions) == 64

Extending the solver machinery

The core of the solving system is written in pure RPython and resides in the rlib/cslib library. It should be quite easy to subclass the provided elements to get specialized, special-case optimized variants. On top of this library is built a PyPy module that exports up to application level the low-level engine functionality.