Church numerals: a tutorial

This is a quite verbose tutorial + some exercices on the implementation of Peano-Church numerals in Haskell, written for not-entirely-beginners, who know a bit of Haskell (its syntax and basic semantics), and who need some examples showing the power of higher-order functions. But they have already learnt a bit about combinators. In fact, I wrote this for my own students, just for fun. Criticism welcome.

Introduction

We know that humans count, and perform simple arithmetic manipulations (at least on natural numbers) since the beginning of their rational thinking, before having invented the symbolic notation for numbers. The number "4" can be and was represented as "****". Or, now, as [0,1,0,0] in a computer. We may ask : is it possible to devise an arithmetic system which would be neutral vis--vis the concrete representation of numbers, which would be as abstract as possible? Which wouldn't care whether "5" means: "throw 5 pebbles into an empty bag", or an exquisite electronic signal processing result?

One such model, implemented in a functional framework, has been invented by Church, but the idea is much older, comes from Peano. (Later on, Rosser added some reasoning to the idea of Church, and even now some people work on Church numerals. Find the nice article of Ralf Hinze: "Church Numerals, Twice" on the Web!) We basically need two notions: the zero, which is an irreductible concept (whether you write it as 0, "" or [] is irrelevant), and the notion of successor, permitting to pass from a number to its ... successor, what else?... This successor passes from 4 to 5, from "****" to "******", etc., throw one pebble more... Without knowing which instance of numbers is being used, we shall implement numbers as functions over those zero and successor. A number n is instantiated as n s z, where z is (for example) "", and s is the operator ('*':), etc. Or, z=0; s=(1+)...

Basic constructions

Our functional layer begins with the definition
zero s z = z
which means that the instantiation of zero retrieves exactly what we have put in as zero; the successor is irrelevant. For people who love the combinatorial notation, there exists a parameter-less definition:
zero = flip const
where
flip f x y = f y x
const x y = x
are standard Curry combinators, the first change the order of arguments of a binary function, and the second applies to two objects and takes the first one. If you wish, you may also write: zero s = id, or zero = \s -> id We shall use one additional, extremely important combinator, the well known composition combinator (.):
(f . g) x = f (g x)
The next step is
one s z = s z
whose meaning is obvious. One can suppress the trailing args, which gives one s = s, or:
one = id
Yes, "one" is the identity. Cute, isn't it? Later we shall see that it is a natural property related to the fact that it is a neutral element for multiplication. Perhaps it would be useful to introduce here some testing interface, the function
sh n = n (1+) 0
(unless you prefer sh n = n ('*':) "") in order to visualize our numbers, since they are opaque, functional objects. We could proceed further, and define two s z = s (s z), but the whole idea is to make the model universal and constructive. We define thus an incrementor in our functional space, an instance of "s". But wait a moment, in the combinator language there is more than way to express things. While
one = \f -> f 
is obviously id, for the zero we also have a stylistically similar construct
zero = \f -> id
as we have already seen above. Of course
two = \f -> f . f
three = \f -> f . f . f
etc., ad nauseam. Forget it, remember just that one adds one pebble to the bag, and zero doesn't do anything, leaves the bag as it was...

Incrementing numbers

Well, throw n pebbles into a bag containing already one. Or, add one to n, as discussed in the next section.
incr n s z = n s (s z)             -- = s (n s z)
or
incr n s = n s . s
and now
one = incr zero
Test: one s z = incr zero s z = zero s (s z) = s z. We may define now two = incr one, but again, while this permits us to construct (and name) three=incr two, four=incr three, five, ..., we don't have any arithmetic yet.

The structure of numbers

Let's recall: two s z = s (s z) = (s . s) z, or two s = s . s as we have already seen. Also, three s z= incr two s z = two s (s z) = (s . s) (s z) = (s . s . s) z, or three s = s . s . s. Of course, five s = s.s.s.s.s, etc. It is thus obvious, from the associativity of the composition, that we might have defined
incr n s z = s (n s z)
or
incr n s = s . n s
Exercice. Prove inductively that s . n s = n s . s. It works for n==zero; check that if it works for n, then also for incr n (with any incr). The associativity of (.) is needed.

Addition

We may proceed in two different ways. First, we define
add n m = n incr m
It looks like guessing... But, if we know the semantics of a generic n : to apply n times something (a successor) to something (the bottom of the number chain), here we see a similar contraption, we shall apply n times the incrementation to m.

Or, we can reason on the instance level; we shall concatenate the chain of m successors with the chain of n successors. Or, we assemble m, and we continue n times more the instance incrementation:

(add n m) s z = n s (m s z)
or
add n m s = n s . m s
Exercice. Show that add (incr n) m = add n (incr m), and check that zero is neutral wrt. the addition. Can you prove the commutativity of the addition? Can you prove the equivalence of the two approaches to the addition? In the last case use the induction, unless you find better ways...

Anyway... We can now construct two = add one one; three = add two one; four = add two two etc.

Multiplication

Intuitively, having a s.s.s. ... s of length m, in order to multiply it by n, we should concatenate n copies of such a chain. Or, if (m s) is a "super-successor" containing m exemplars of s, what we need is
mul n m s z = n (m s) z
or: mul n m = n . m . Multiplication is functional composition! Now you see in a different context that one==id is equivalent to 1, neutral for the multiplication since f . id = id . f = f.

We may think differently. n times m is adding m to zero n times (or vice-versa). So, this should also work:

-- mul n m=(add m ...(add m (add m zero)))  -- n times
or
mul n m = n (add m) zero
Exercice. Once again, try to demonstrate the equivalence of the two variants, the commutativity, the associativity, and also the distributivity: a*(b+c) = a*b + a*c. Quite a task... The point is that all those equivalences are not universal properties of the concerned unknown entities "n", "m", etc, but should be valid for Church numerals, i.e., for zero and all its incrs, so the induction seems the only easy way, unless you can detach some relevant properties out of the inductive reasoning chain.

Anyway, there is one property of all those constructs which should be acknowledged: none of the definition above is recursive! Think a little bit: could it be recursive? Recurring on WHAT? Compare this with the more popular "naive" (not so...) construction of addition based on patterns:

0 + x = x
(incr n) + x = incr (n+x)
etc. This is adapted to sequential data-like number constructions:
data Nat = Zero | Succ Nat
for which obviously: incr n = Succ n, and
add Zero n = n
add n Zero = n
add (Succ n) m = incr (add n m)   -- + commut., not needed (?)
works well. This is the original Peano numeral system. I have seen somewhere somebody calling this model "Peano-Church numerals". Well... This is a different model. The relation between the two can be found in the Ralf Hinze 'Theoretical Pearl' article: "Church Numerals Twice". Of course, in the true P-Ch model the additions, the multiplications, etc. reduce to the applications of the successor, so constructing a number 765176596197659 (by any means...) will take quite a time, even without any explicit primitive recursivity. This is intriguing, where the computer wastes its time?... Note that while the addition in the recursive Peano model has the complexity n (linear), in the Church/Rosser formulation it is - apparently - constant. Multiplication idem, while in the recursive model it goes like n*m, as to be expected...

Exponentiation

You did realize that the Rosser version of the multiplication, mul n m = n . m is simpler than the addition: add n m = \s -> n s . m s. Would it be possible that the exponentiation is even simpler? Yes! In fact pow n m = m n (where pow n m means nm. This is inverse application. It is not easy to derive this definition ab ovo. What we can do easily is to apply the same reasoning as for the multiplication:
-- pow n m = (mul n ( ... (mul n (mul n one))))  -- m times
or
pow n m = m (mul n) one
This is OK, but a bit boring. For the inverted application formula we may prove easily that pow n zero == one, since zero n = (\s->id) n = id. Exercice. Analyze the form pow zero m = m zero. Prove that if m is not zero, the result is zero.

On the other hand

pow n (incr m)  = incr m n = m n . n   
Enough? Don't forget that the composition is the multiplication, so we got nm+1 = nm * n. BTW, from the formula pow n m = m (mul n) one we see also that this reduces to n . n . n . .... n (m times), and the Rosser formula can be re-deduced from it, although doing it formally is not so straightforward.

The Rosser formula for the exponentiation has an interesting set-theoretical face. You should/might know that in the theory of sets, for any sets A and B, the notation BA denotes the set of all functions from A to B. Typically, when teaching this stuff, one applies the argument based on cardinality. Adding one element to A, permits to find |B| more mappings, from the additional element to all of B. So, the number of mappings is multiplied by |B|, in agreement with : BA+1 = BA * B. Here we see naturally that "exponentiation = inverse application".

A warning... You don't see any recursivity, but, please, try: pow ten ten or something not too far. On my GHCi I get immediately the stack overflow message, already for 77, unless I change the stack length. Well, I lied. Not immediately. But, eventually. So somebody stacks things. Who and where? You won't pass from the kindergarten to the primary FP school unless you answer this question...

From Peano to Church and back

We could think a little bit on the decrementation/subtraction problem, but let's think a little bit on the relation between Nat and the Church numbers. It may help you to solve the puzzle above...

The Peano numerals with their arithmetics can be expressed in terms of a specific 'fold' function (all below is inspired by the Ralf Hinze paper), defined as

fold s z Zero = z
fold s z (Succ n) = s (fold s z n)
If you have some acquaintance with standard Haskell foldl/r functionals over lists, you should understand the above. There are no binary operators here, just the transition between Peano, and our abstract functional instance, which permits to convert any Nat to z or some application of s to the converted predecessor..

We see that

fold Succ Zero Zero = Zero
fold Succ Zero (Succ Zero) = Succ (fold Succ Zero Zero) = Succ Zero
and if fold Succ Zero n = m, then
fold Succ Zero (Succ n) = Succ m
which implies: fold Succ Zero = id as is to be expected, this converts Nat to Nat. All the arithmetic operators are instances of fold:
zero = flip const    -- keep these
one =  id

add n m = fold Succ m n
mul n m = fold (add m) zero n
pow n m = fold (mult n) one m
Prove this!. Also, simplify the above using commutativity.

Now "re-derive" the more abstract Church numerals from the Peano model:

type Church = (a->a) -> a -> a
nat :: Church -> Nat
nat c = c Succ Zero
church :: Nat -> Church
church n = \s z -> fold s z n
We see that
(nat . church) x = nat (\s z -> fold s z x) =
  fold Succ Zero x = x
so nat . church = id from the universal property of fold. But, bad surprize,
(church . nat) x = \s z -> fold s z (x Succ Zero)     -- [1]
cannot be reduced, since s,z remain abstract...

Digression on Parametricity

Attention, this section is difficult. Skip it, if you must... In 1989 Philip Wadler wrote a paper "Theorems for free", in which he shows that any polymorphic type has non-trivial properties. For example imagine that r is a polymorphic function of the type X* -> X*, for any X. The notation a* denotes "list of a". The notation rA denotes the instantiation of the type of r to X=A.

Wadler claims that for any function a : A -> A' we have

a* . rA = rA' . a*
where the functorial counterpart of A*, a function a* means - naturally: (map a). The theorem says that if you apply a map, and then rearrange the elements of a list, you get the same as if you had rearranged first, and mapped then. The essential is the fact that r is polymorphic. If it is applicable to any list type, this function cannot do anything "concrete" to the list elements, cannot, say, apply a binary arithmetic operator, nor anything similar. It may only rearrange them, e.g., invert the list, or embed the elements in another list, or a tuple, etc., do something which never needs the values of the elements of the list. Such manipulations as sorting, won't do, since they require a specific comparison operator.

We cannot spend more time on that, but we shall know that the theorem is not restricted to lists, but to all polymorphic types. For Church, the parametricity condition can be formulated/used as follows. Let x : Church. Let A,A' be arbitrary types. Then for all f : A->A; f' : A'->A', and h : A->A' we have

h . xA f = xA' f' . h  <<== h . f = f' . h
Intuitively it is clear. x being polymorphic, what can it do to its argument f? It can only compose it with itself some number of times! We shall have (convince yourself or prove): h . (f . f . ... f) = (f' . f' . ... f') . h. Thus, for f=Succ, f'=s, h=fold s z, and using, from the definition of fold:
fold s z . Succ = s . fold s z      -- Check this!
we obtain
fold s z . x Succ = x s . fold s z  -- Prove it!
So:
[1] = \s z -> fold s z (x Succ Zero) = 
  \s z -> (fold s z . (x Succ)) Zero = \s z -> (x s . fold s z) Zero =
  \s z -> x s (fold s z Zero) = x s z  -- (from the def of fold)
This completes the proof, (church . nat) x = x, or church . nat= id. So, one may say that Church numerals are "folds in disguise". And we can encode everything: zero = church Zero; s(church n) = church(Succ n). Obviously zero = \s z -> fold s zero Zero = \s z -> z as we know. We can derive the addition, multiplication, etc. But is it not enough to reconstruct the Rosser forms. Anyway, you should realize that even if for the addition and the multiplication, the arguments are of the same type, for the exponentiation pow n m = m n the args instantiate to different types, otherwise the type would be infinite. So, the polymorphism is here really essential.

Decrementation and subtraction

This is a disturbing problem... In the Peano version, in order to decrement a number we may use the pattern matching
pdecr (Succ n) = n
pdecr Zero = Zero   -- sorry, no other value available...
The subtraction iterates the decrementation:
psub n Zero = n
psub n (Succ m) = psub (pdec n) m
But in the functional version we don't have patterns, since we don't have any constant data, everything is parametric. For some time Church himself thought that the subtraction is not possible in this model. But it is, although its complexity is awful. The reasoning goes as follows: begin with zero, and keep counting until n, but storing the last two numbers. Then when you get n, the previous one is its predecessor.
suc (_,b) = (b,incr b)
dec n = x where (x,_) = n suc (undefined,zero)
There is no recursivity either, but we are not dupe, we know that applying n produces a long set of successors which all will be executed (some day. When?). So, the complexity is linear in n, although there is no recursivity involved. We may test it manually, say, demanding to print (dec (dec (dec eight))), which yields 5. So, full of hope we define
sub n m = m dec n
but all attempt to use it, e.g. demanding to show sun six two yields a type error! The system refuses to type two, saying that it has an infinite type (like the Y combinator or other abomination perfectly legal in the untyped lambda calculus). Discussing this phenomenon is beyond the ambitions of this text, it suffices to know that Haskell tries to give to m two different types which are not unifiable because of the occur check...

The solution of the dilemma exists, but it is a bit ugly, and formally a bit advanced, using the s.c. existential types. The idea is to hide the type of the Church numeral in such a way that it can be instantiated differently in the same construct in which it appears several times. We remind that a Church numeral - in general - is a function which iterates its 'successor' argument, producing the 'iterated successor', so the generic type is (a->a)->(a->a). We may declare thus

type Church = forall a . (a->a)->a->a  -- and then

zero :: Church
zero = flip const
one :: Church
...

incr :: Church->Church 
and the last declaration turns out to be faulty. The Rosser forms for the addition, multiplication and power are OK. But this definition: add' n m = n incr m refuses to be typed, and the definition of suc fails as well with the diagnosis:
Inferred type is less polymorphic than expected
Quantified type variable `a' escapes
The solution is to hide the forall a . as well:
newtype Church = Ch (forall a . (a->a)->(a->a))
etc. Let's recall the definitions in order to keep them in one place, with two variants of some entities, Rosser, and "high level" (primed):
sh (Ch n) = n (1 +) 0 :: Integer    -- For testing

zero :: Church
zero = Ch (flip const)
one :: Church
one = Ch id

incr :: Church->Church
incr (Ch n) = Ch (\s -> n s . s)

one' = incr zero
add' (Ch n) m = n incr m
add (Ch n) (Ch m) = Ch (\s -> n s . m s)

two = add one one
two' = incr one
three = add one two
three' = add' one' two'
-- ===================================

mul (Ch n) (Ch m) = Ch (n . m)
mul' (Ch n) m = n (add m) zero

pow (Ch n) (Ch m) = Ch (m n)

four = mul two two
five = add two three
six = mul two three
six' = mul' two' three'
seven = add three four
eight=pow two three
nine = pow three two
ten = mul two five
Also, the decrementors and the subtraction:
dec (Ch n) = 
  Ch (\s z -> fst (n (\(_,x)->(x,s x)) (undefined,z)))

dec' (Ch n) = fst (n (\(_,x)->(x,incr x)) (undefined,zero))

sub n (Ch m) = m dec n
sub' n (Ch m) = m dec' n
work (apparently) correctly. Recall that newtype is a type synonym which pretends to be data: it introduces a unique constructor, but during the compilation this constructor disappears. Church numerals are functions, not "boxed" functions. But the model with newtype is a bit uglier that the original...

Digression on pairs (x,y)

If you don't know this yet, the power of the lambda-calculus is sufficient to provide the equivalent of Cartesian product, we don't need tuples. Look at these definitions:
cons x y f = f x y

car f = f const
cdr f =f (flip const)
The partial application a = cons 1 2.5 yields an opaque functional object, a closure which stores inside the values of the arguments. This may be used as pair, we need only to be able to extract them. Well, let's try car a = a const = cons 1 2.5 const = const 1 2.5 = 1. In the same way cdr retrieves the second argument, and everything can be nested. So, if you feel inclined, replace in the previous section the occurrences of (x,y) by cons x y, and substitute car for fst.

How to compare Church numerals

Let's assume the existence of constants LT,EQ,GT denoting the arithmetic order relation between entities. For the Peano model we have
data Rel = LT | EQ | GT    -- GHC Base defines them.

p_ord Zero Zero = EQ
p_ord (Succ a) (Succ b) = p_ord a b
p_ord _ Zero = GT
p_ord _ _    = LT
But for the Church numerals? We cannot compare functions, even "boxed" in a constructor Ch. But we can define
iszero (Ch n) = n (\z -> False) True
which doesn't need a conditional if to distinguish between zero and something else. Now, in order to compare two numbers, subtract them, and compare with zero. Beware, there aren't any negative numbers here, 5-8=0, so this is not entirely trivial, you have to compute both differences, n-m and m-n. This is an exercice for you...

We end these notes here, leaving to the reader the functional construction of Booleans, of Rel, of a conditional if (or something equivalent), and of other goodies which show that the lambda calculus gives you the Universe...