** WORK IN PROGRESS **
This document tries to shed some light on integration of logic and constraint programming into Python using the PyPy framework.
This takes place in Working Packages 09 and 10 of the EU PyPy funding project. The logic and constraint programming features are to be added to PyPy (WP9). An ontology library will be provided and will serve as our first use case for logic programming.
PyPy has been progressively equiped with a parser and compiler flexible enough that it is hoped that developpers can leverage it to extend the language at runtime. This is quite in the spirit of Lisp macros, if not the exact manner. It is expected that an aspect oriented programming toolkit be built using the compiler and parser infrastructure (WP10). This will serve the needs of WP9.
This work was described as integration of logic programming and constraint programming into PyPy. Both are obviously related and we have settled on the concurrent logic and constraint programing (CCLP) model present in the Oz programing language. It allows to write logic (Prolog-style) programs and to use constraint solving techniques in an integrated manner (as opposed to the use of an external toolkit with high impedance mismatch between language runtime and constraint solving package). The relational way will be built on the constraint solving machinery (much like, in Oz, the choice operator is built on top of choose).
This will allow
Lastly, here we mainly discuss syntactical issues: those are probably the least difficult aspects of getting CLP into python; getting an efficient implementation of the canonical algorithms into PyPy will be the bulk of the work.
In constraint programming, a 'problem' is a set of variables, their (finite discrete) domains, and the constraints that restrict their possible values (or define the relations between the values). When all these have been given to a constraint solver, it is possible to find all possible solutions, that is the sets of valuations that satisfies simultaneously all constraints. The solver is solely responsible for finding solutions (or lack thereof).
At the time being, there exists a constraints package made by Logilab and written in pure python, which implements some parts of the solver found in Mozart (the reference Oz implementation). We use it to illustrate where we want to go, syntactically-wise.
Let's start with a quite standard example (the problem being solved here is fully described on http://www.logilab.org/projects/constraint/documentation):
# import Repository class and fd module, from logilab.constraint import * variables = ('c01','c02','c03','c04','c05','c06','c07','c08','c09','c10')
Variables are represented as any string object:
values = [(room,slot) for room in ('room A','room B','room C') for slot in ('day 1 AM','day 1 PM','day 2 AM','day 2 PM')]
Values can be freely pre-computed using standard python constructs; they can be any object; here, tuples of strings:
domains = {} for v in variables: domains[v]=fd.FiniteDomain(values)
The relationship between variables and their possible values is set in a dictionnary whose keys are variable designators (strings). Values are wrapped into FiniteDomain instances (FiniteDomain has set behaviour, plus some implementation subtleties):
groups = (('c01','c02','c03','c10'), ('c02','c06','c08','c09'), ('c03','c05','c06','c07'), ('c01','c03','c07','c08')) for g in groups: for conf1 in g: for conf2 in g: if conf2 > conf1: constraints.append(fd.make_expression((conf1,conf2), '%s[1] != %s[1]'%\ (conf1,conf2)))
Constraints are built by make_expression which takes a tuple of one or two variables and a string representing an unary or binary relationship. The example complete with all constraints is provided at the url mentioned supra.
Then, when everything has been settled, comes the last step:
r = Repository(variables,domains,constraints) solutions = Solver().solve(r) print solutions
Due to the compactness of Python syntax, this sample problem specification remains quite small and readable. It is not obvious what could be done to make it smaller and still readable.
Variables are not first-class (but close ...) and have nothing to do with Python standard variables. The good side of this is that we can't misuse a CSP variable with an ordinary variable.
Specifiying a constraint is clunky : variables and operator have to be provided separately, and the operator has to be a string. This last restriction because Python doesn't allow passing builtin infix operators as functional parameters.
(the following sub-chapters are considered deprecated)
First, promote variables to second-class citizenry. Be able to write something like:
domain = [(room,slot) for room in ('room A','room B','room C') for slot in ('day 1 AM','day 1 PM','day 2 AM','day 2 PM')] c01 := domain c02 := domain
This introduces a special operator := which binds a logical variable to a domain. More generally:
var := <any iterable>
With respect to normal assignment, we can imagine the following:
c01 = 'foo' # raises a NotAssignable or ReadOnly exception bar = c01 # takes a reference to the current value of c01 into bar # also, meaningless (so ... None) before the solver has run
Problem ... we can't anymore do:
for conf in ('c01','c05','c10'): ...
It should be good to define a kind of first-class designator for these kind of variables. A specially-crafted class representing variables (in the manner of Lisp's symbols) would suffice:
for conf in (c01, c05, c10): ...
Is it worth the price ? Quite unsure.
An alternative which avoids the special operator and uses a keyword instead could be:
domain: c01 = <iterable> c02 = <iterable>
It makes us reuse =, with twisted (non-standard Python) semantics but under a clear lexical umbrella (a domain: block).
It is possible to get further in this direction:
problem toto: D1 = <domain definition> a,b,c in D1 def constraint1(a,b,c): a == b for sol in toto: print sol
There, we put a full constraints mini-language under a named 'problem' block. The problem becomes a first class object (in the manner of Python classes) and we can (lazily) extract solutions from it.
The ugly aspect of py-constraints is the definition of custom unary/binary constraints through make_expression, as in:
fd.make_expression ('var1', 'var2', "frob(var1,var2)")
One solution might be to parse the string at runtime to recover the variable names:
fd.make_expression ('frob(var1,var2)')
A simple hand-written parser could be sufficient for this. On the other hand, the lexically bounded mini-language proposed above helps solve this more uniformly.
Integrated search seamlessly into an already grown up imperative programming language might cause some headaches. For instance, in the perspective of benefiting from the Prolog way of doing logic programming, we considered embedding a specific mini-language into Python with strong and well-defined borders between the logic world and the 'normal', or usual imperative world of standard Python.
Such a twist might be not needed, fortunately. Designers of Oz have devised another way of doing logic programming that is certainly much more easily integrable into current Python than bolting Prolog into it.
The Prolog-style for logic programming, while successfully applied to many real-world problems, is not without defects. These can be summarized as the following :
From the PyPy sprint in Belgium that was focused on constraint and logic programming emerged an implementation of a so-called 'Logic Objectspace' which extends PyPy's standard object space implementing standard Python operations with two things :
Logic variables have two states : free and bound. A bound logic variable is indistinguishable from a normal Python value which it wraps. A free variable can only be bound once (it is also said to be a single-assignment variable).
The operation that binds a logic variable is known as "unification". Unify is an operator that takes two arbitrary data structures and tries to check if they are the same, much in the sense of the == operator, but with one twist : unify is a "destructive" operator when it comes to logic variables.
Unifying one unbound variable with some value means assigning the value to the variable (which then satisfies equalness), unifying two unbound variables aliases them (they are constrained to reference the same -future- value). Thus unify can change the state of the world and raises an UnificationError exception whenever it fails, instead of returning False like an equality predicate.
Assignment or aliasing of variables is provided by the 'bind' operator.
When a piece of code tries to access a free logic variable, the thread in which it runs is blocked (suspended) until the variable becomes bound. This behaviour is known as "dataflow synchronization" and mimics exactly the dataflow variables from Oz. With respect to behaviour under concurrency conditions, logic variables come with two operators :
Wait and wait_needed allow to write efficient lazy evaluating code.
All of this is not sufficient without a specific non-deterministic primitive operator added to the language. In Oz, the 'choice' operator allows to statically enumerate a set of possible actions, leaving the actual decision to choose between several branches to the solver.
Let us look at a small relational program written respectively in Prolog, Oz and extended Python.
Prolog
Soft(beige). Soft(coral). Hard(mauve). Hard(ochre). Contrast(C1, C2) :- Soft(C1), Hard(C2). Contrast(C1, C2) :- Hard(C1), Soft(C2). Suit(Shirt, Pants, Socks) :- Contrast(Shirt, Pants), Contrast(Pants, Socks), Shirt != Socks.
Oz
fun {Soft} choice beige [] coral end end fun {Hard} choice mauve [] ochre end end proc {Contrast C1 C2} choice C1={Soft} C2={Hard}[] C1={Hard} C2={Soft} end end fun {Suit} Shirt Pants Socks in {Contrast Shirt Pants} {Contrast Pants Socks} if Shirt==Socks then fail end suit(Shirt Pants Socks) end
Python
def soft(): choice: 'beige' or: 'coral' def hard(): choice: 'mauve' or: 'ochre' def contrast(C1, C2): choice: unify(C1, soft()) unify(C2, hard()) or: unify(C1, hard()) unify(C2, soft()) def suit(): let Shirt, Pants, Socks: contrast(Shirt, Pants) contrast(Pants, Socks) if Shirt == Socks: raise UnificationError return (Shirt, Pants, Socks)
Since our variables (those created by the let declaration) really are logic variables, and thus can be assigned to only once, the solver must take some special measure to get all the solutions. The trick is that the solver uses the Computation Space machinery for constraint solving. Basically, a computation space is like an independant world in which a specific, unique combination of choices will be tried and eventually a -locally- unique solution be produced. The solver uses as many computation spaces as necessary to eventually enumerate all possible solutions.
For constraint programming:
For logic programming:
For constraint programming:
Logic programming:
For both: