Non-deterministic monad: a tutorial

Introduction

This is a tutorial on how to use lazy lists in Haskell in order to implement some non-deterministic algorithms, ubiquitous in Prolog, e.g. some combinatorics: permutations, partitions, powerset, etc., how to solve the "8 queens problem" using backtracking in an intuitive way without having the Prolog backtracking by hand, etc. All this might be useful in solving some AI problems in a functional language.

The reader should have read something about monads in order not to get lost, to keep some contact with the theory. But we shall cover the essential, omitting altogether the "true monads", such as IO from the discussion. The idea of the monad in the context we want to treat, is the following: this is an abstraction of a computation, which replaces the piece of data. In "normal" functional programming we have pieces of data, and we apply functions to them, this is all. Expressions are built from nested applications of functions to some internal data. And we call it pompously "the trivial monad". Now, the remaining of the discussion is a bit limited and incomplete. But, the programs included here are to be read and understood, not just to put them in your personal library.

Two words about monads

Imagine now that we want do do something more... Finally almost everything will be boiled down to this trivial monad, but conceptually we might think of the following:

The non-determinism. Examples.

This has nothing to do with anything random, or beyond control (such as the queue of external events). The logical non-determinism is a way of thinking. It is an art of posing questions in such a way that there are many possible answers (sometimes just one, but this is a special case), or none. This last case is called failure. Let's see some examples:

  1. Give me an element of this list. Or, give me this list with an element removed.
  2. Give me a permutation of this list.
  3. Give me a combination of M elements of this set, which contains N items.
  4. Find a way of placing 8 queens on a chess-board in such a way that no queen threatens another one.
  5. Find a path in this graph; it should fulfil the following constraints: /.../.
Now, the idea which we shall implement in Haskell is the following: return the list of all solutions to such a problem. Take into account the following: We will show now how to do this. We shall concentrate on the intuitive approach, without touching monads anymore. But when you are done with this text, perhaps you may wish to read another one, which describes the process more formally. Don't read it too soon...

The implementation. Examples.

From now on, pieces of code in green are not legal Haskell!; they are "intuitive pseudo-codes", which should be converted into genuine functional programs.

The easiest problem is the instruction: pick any element of the list l. The answer is obviously... l itself. Any questions?

Let's pass to a more difficult problem, how to return a list without one, chosen arbitrarily, element. So, if l=[1,2,3,4], the non-det answer will be a list of lists

[[2,3,4],[1,3,4],[1,2,4],[1,2,3]]   -- or any other order
Let's begin with a model in which the non-determinism is a kind of primitive. Suppose that we have at our disposal a mythical language, which has an operator of logical alternative, OR. We may formulate the answer to the original question in the following manner: either remove the head of the list, or keep the head, but remove an element from the tail:

remove (x:xq) = xq OR x:(remove xq)
remove [] = FAIL
where FAIL means that this is impossible. No answer. Now, recall what we have said: the system should yield all answers. If there are none, it should return an empty list. On the other hand, OR means, the answer at the left or the answer at the right, so, both of them. But this is not trivial. At the left we have one concrete answer. We may put it into a list, and replace OR by (++), but at the right we have an abomination: the form (x:) adds an element to one concrete list, one possible answer, not to all of them.

So, the conclusion of this experience may be boiled down to the following observation:

If we need to apply a "normal" function to a non-det value, we map the function through it (the list of all solutions).
We code thus
remove (x:xq) = [xq] ++ map (x :) (remove xq)
remove [] = []
It is silly to write [xq]++ .... We have thus finally, with a bit of real, but also of some dubious optimization
remove [_] = [[]]
remove (x:xq) = xq : map (x :) (remove xq)
Why this "optimization", which make the expression remove [] bomb (launch an exception) instead of failing (returning [])?

In fact there is no 'real' reason, just a reminder of some useful truths:

  1. Of course, if your program misses one possible pattern-case, it may bomb. This may be a disgrace, but sometimes you may wish it. It is primordial to distinguish a failure which comes from a logical condition which is not satisfied, and a failure which is an error in disguise, for example a bad structure of some data. Here there is no "official" best way, you may have empty lists as your input, or you may forbid them.
  2. Don't forget that if you want to return one answer, it must be put on a list, so returning the empty list should be written as [[]].

Another example, the permutations. We begin with the following sub-problem: how to insert an item non-deterministically, in any place, of a list, e.g., to insert 99 in [1,2,3,4], returning [99,1,2,3,4], [1,99,2,3,4], [1,2,99,3,4], [1,2,3,99,4], [1,2,3,4,99]]. The algorthm goes as follows: either insert it at the beginning, or keep the head, and insert it into the tail, restoring the head after.


ndinsert x l@(y:yq) = (x:l) OR y : ndinsert x yq
ndinsert x [] = [x]

Now, we know the song:
ndinsert x [] = [[x]]    -- Yes! Don't forget additional []
ndinsert x l@(y:yq) = (x:l) : map (y :) (ndinsert x yq)
Now, the permutation problem itself. The idea is: detach the head, find a permutation of the tail; put the detached head somewhere in the permuted list. Non-deterministically:

permut [] = []
permut (x:xq) = ndinsert x (permut xq)
... and we notice another structural problem. This time we apply a non-det function ndinsert to a non-det argument, produced by permut.This last, makes a list of answers, so ndinsert must be mapped. But this does not suffice, since for each concrete instance of the response, ndinsert produces a new non-det collection. With just a map we would obtain a list of list of answers, not a list of answers. So, it is necessary to flatten the resulting list, using concat.
permut [] = [[]]
permut (x:xq) = concat (map (ndinsert x) (permut xq))
For permut [0,1,2,3] Haskell responds joyfully:
[[0,1,2,3],[1,0,2,3],[1,2,0,3],[1,2,3,0],[0,2,1,3],[2,0,1,3],
 [2,1,0,3],[2,1,3,0],[0,2,3,1],[2,0,3,1],[2,3,0,1],[2,3,1,0],
 [0,1,3,2],[1,0,3,2],[1,3,0,2],[1,3,2,0],[0,3,1,2],[3,0,1,2],
 [3,1,0,2],[3,1,2,0],[0,3,2,1],[3,0,2,1],[3,2,0,1],[3,2,1,0]]
and that's it. Exercice. Make another variant of the permutation algorithm, where first one removes an element non-deterministically from the list, and keeps this element separately. Then, generate a permutation of a truncated list, and re-insert the stored element at the front.

More examples

Powerset

How to construct a subset (any) of a set (list)? It suffices to traverse this list once, and for each element take a binary decision: we keep / we don't keep this element.

subset (x:xq) = (subset xq) OR x:(subset xq)
So, this is easy. Concatenate the two alternatives, which after a mild optimisation, and completion of the empty case gives
powerset [] = [[]]
powerset (x:xq) = l ++ map (x:) l
 where l = powerset xq
Exercice. Without testing this function on a computer, find out in which order the algorithm generates the powerset of [0 .. 3].

Combinations of a given length

"Take M elements out of a list. Order irrelevant". Note that this can be considered as a filtered version of the powerset: generate all combinations, and filter only those whose length is M. This may be considered as a waste of ressources, although in a lazy framework this inefficiency is not extremely big. But, anyway, let's do it from scratch. The idea is that when you pick up an element, only (M-1) to select remain, if not - full M. If M=0 you don't select anything, obviously.

comb 0 l = []
comb m (x:xq) = (x : comb (m-1) xq) OR (comb m xq)
comb _ _ = FAIL
so,
comb 0 _ = [[]]
comb m (x:xq) = map (x:) (comb (m-1) xq) ++ comb m xq
comb _ _ = []
Any questions? The translation here is quite straightforward.

Partitions of an integer

This is a more elaborate combinatoric problem. A partition of N, say 7 is a splitting (or a set of all splittings) of N in k positive integers such that their sum gives N. Order does not count. So, for 7 we shall have, in an intuitive order

7
6 1
5 2
5 1 1
4 3
4 2 1
4 1 1 1
3 3 1
3 2 2
3 2 1 1
3 1 1 1 1
2 1 1 1 1 1
1 1 1 1 1 1 1
The number of partition algorithms is quite considerable... We shall analyze just one, which encourages the use of the non-deterministic reasoning, and recursivity (of course). Since the partition is ordered (this is the meaning of "order does not count"; all orderings are equivalent, so choose one representant), we may begin with the choice of the first element of the list, say, M, subtract it from N, and arbitrarily generate the partition of the remaining value N-M. There is one constraint which must be respected: further partitions must limit the choice of the first element to be inferior or equal to M. We might thus introduce an auxiliary function pmax, which takes a second argument, the maximum value of the first component:

partit n = pmax n n
and now recur. We introduce a trivial non-deterministic function choice a b which "generates" a number between a and b:

pmax n mx = let m = choice 1 mx
                k = n-m
                b = min k m 
            in m : pmax k b 
The minimum function is necessary in order to rationalize the choice - either the maximum value imposed, or just the remaining number, if it is smaller. We see that there is no OR pseudo-operator here. The non-determinism is dictated by choice, and passing from the conceptual "any", to implemented in Haskell "all", we shall map the further decisions over the values of the list representing the choice. Now we must be careful, since there will be two maps. One is related to (m :) since pmax is non-deterministic. The second is over the choice. In this latter case there is a non-deterministic embedding, so we shouldn't forget about the flattening of the list. With some experience, the coding is straightforward:
partit n = pmax n n

pmax 0 _ = [[]]
pmax n mx = 
  let seg m = map (m :) (pmax k (min k m)) where k=n-m
  in  concat (map seg [mx,mx-1 .. 1])
(where the choice list has been arranged in the decreasing order, in order to get from (partit 7) the result: [[7], [6,1], [5,2], [5,1,1], [4,3], [4,2,1], [4,1,1,1], [3,3,1], [3,2,2], [3,2,1,1], [3,1,1,1,1], [2,2,2,1], [2,2,1,1,1], [2,1,1,1,1,1], [1,1,1,1,1,1,1]]).

This will end the combinatorial examples. They may help you in solving some AI problems using the Prolog-style reasoning, but coded in a "classical", deterministic language, without tricks ad hoc. Of course, the laziness of Haskell is quite important, if the solution collection is long; in a strict language it is too easy to overflow the memory. And, needless to underline, the blind "generate-and-test" programming is usually so inefficient, that it is not useful for serious AI tasks.

The 8 Queens problem

We should generate a configuration similar to the one below (but correct!!), where 8 chess queens have been placed on board in a way that on every horizontal, every vertical, and every diagonal there is just one queen, so that no queen can capture any other. Again, the number of algorithms presented in the literature is really big. The most stupid approach is the trivial implementation of the "generate - and - test" paradigm: make all the configurations, and reject those which do not obey the imposed constraint... Doing this by hand - e.g., row by row - begins easily, but at the end it may turn out that there is no place left, as we see below.

               
               
               
               
               
               
               
               
Then the strategy consists in removing the last good queen, placing it elsewhere, and continue. In the example above we see that the seventh queen is stuck, there is no more choice (the column 4 is illegal). Thus, we backtrack further, and we remove the queen no. 6, which may be put in the column 4, and we continue until the next disaster, or until the end.

When the last queen is placed, we store the solution as correct, and we continue anyow with the same backtracking as above, in order to generate another solution. So, conceptually everything is clear. But now, take into account that while programming we cannot "remove a queen", this would be a side-effect, requiring several global variables to store temporarily the last configuration, plus the data permitting to restart the process. We are constructing a logical/functional algorithm, where we only can generate another configuration, with all the book-keeping kept local.

First thing to do is to establish the representation of the data used. It would be too inefficient to keep in the memory the full "matrix": a list of 8 lists of 8 elements (rows of columns, or columns of rows). It suffices to generate a permutation of [1 .. 8] with the interpretation: the value of the i-th element is the column of the queen placed on the i-th row. Or, more confortable convention: the i-th element of the list l is the column of the queen placed on the (length(l)-i+1)th row. I.e., the list l=[a,b,c,d] is constructed as follows: first, we put d on the first row. Then, c on the second, b on the third, and a on the fourth. In such a way, adding new elements at the head of l follows the same conventions.

Let begin with a construction of a predicate which tests whether a new (next) queen placed on the column n is safe wrt. all queens already placed, and specified by the list l

safe n l | elem n l = False                        -- same column
         | otherwise =
             let sf (x:q) m | (x-m/=n && x+m/=n) = sf q (m+1)
                            | otherwise = False    -- (same diagonal)
                 sf _ _ = True                     -- all rows tested
             in sf l 1
The cooking of the non-deterministic algorithm goes as follows. Begin. Choose a row. Verify whether the queen is safe; if yes, put another queen, otherwise fail. If the ordinal of the to-be-placed queen is bigger than 8, stop. So, symbolically:

queens = qput 1 []  where
  qput 9 l = l                -- done.
  qput k l = let n = choice 1 8
             in  if safe n l then qput (k+1) (n:l)
                             else FAIL
The failures will truncate all 'illegal' branches of the non-deterministic process. Now comes the coding, a bit different from what we have seen, but not too much.

queens = qput 1 [[]] where
  qput 9 ll = ll
  qput k ll = qput (k+1) (concat (map addq ll))
  addq l = [n:l | n <- [1 .. 8], safe n l]
It is easy to check that the length of queens is equal to 92, all the symmetry variants are considered different. The function addq adds non-determinitically one queen to one current solution (the same as (n:l) with non-deterministic n).

The function qput is, as previously, just an iterator. Note the extrapolating recursion style: we do not simplify the argument while recurring, but we progress, and finally we break the recursion by a "hard-break", k=9. Of course, generalizing the algorithm to, say, 15, is trivial, but be aware that there are 2 279 184 solutions. Ok, here is the generalization, and a test:

queens m = qput 1 [[]] where
  qput k ll | k==m+1 = ll
            | otherwise = qput (k+1) (concmap addq ll)
  addq l = [n:l | n <- [1 .. m], safe n l]

concmap f [] = []
concmap f (x:xq) = f x ++ concmap f xq

queens 7
[[6,4,2,7,5,3,1],[5,2,6,3,7,4,1],[4,7,3,6,2,5,1],[3,5,7,2,4,6,1],
 [6,3,5,7,1,4,2],[7,5,3,1,6,4,2],[6,3,7,4,1,5,2],[6,4,7,1,3,5,2],
 [6,3,1,4,7,5,2],[5,1,4,7,3,6,2],[4,6,1,3,5,7,2],[4,7,5,2,6,1,3],
 [5,7,2,4,6,1,3],[1,6,4,2,7,5,3],[7,4,1,5,2,6,3],[5,1,6,4,2,7,3],
 [6,2,5,1,4,7,3],[5,7,2,6,3,1,4],[7,3,6,2,5,1,4],[6,1,3,5,7,2,4],
 [2,7,5,3,1,6,4],[1,5,2,6,3,7,4],[3,1,6,2,5,7,4],[2,6,3,7,4,1,5],
 [3,7,2,4,6,1,5],[1,4,7,3,6,2,5],[7,2,4,6,1,3,5],[3,1,6,4,2,7,5],
 [4,1,3,6,2,7,5],[4,2,7,5,3,1,6],[3,7,4,1,5,2,6],[2,5,7,4,1,3,6],
 [2,4,1,7,5,3,6],[2,5,1,4,7,3,6],[1,3,5,7,2,4,6],[2,5,3,1,7,4,6],
 [5,3,1,6,4,2,7],[4,1,5,2,6,3,7],[3,6,2,5,1,4,7],[2,4,6,1,3,5,7]
]
The functional concmap is an extra bonus for you. We have seen already a few times the construct (concat (map ... )), and it is sufficiently frequent, that it would be useful to have it as a basic brick.

Conclusions

So, now you can transpose several programs from Prolog to Haskell. Of course, far from all non-deterministic programs, without a very thorough analysis! In Haskell we have no full unification, nor uninstantiated logical variables, so the structure of the programs may be very different, but at least you have some starting point.

The presented framework is not the only one possible. It realizes what is usually called the "data backtracking". But, there is another view of the logical non-determinism, which could be called "control backtracking", based on alternative continuations. Stay tuned, one day we shall describe it in this collection of notes.