Haskell CS 421 LogoCS 421 — Programming Languages

Lambda Calculus

Introduction and Objectives

There are three major paradigms of programming languages that are popular today (and a couple more worth your consideration). Each of these paradigms can be reduced to a specific model of computation. In this chapter we will consider the λ\lambda-calculus, which is the model of computation for the functional programming languages.

When you are done reading this and working through the exercises, you should be able to explain the three constructs that are part of λ\lambda-calculus, show how to simulate arithmetic and recursion, and explain why a computer scientist should be concerned with a theoretical language that can’t even be bothered to have integers or booleans.

The Language

Alonzo Church and Stephen Kleene developed λ\lambda-calculus in the 1930’s to reason about the properties of functions. By giving a fully general set of rules for evaluating a function we are able to reason about the computational aspects of functions. (An alternative is to think of functions as a set of pairs of arguments and values. In this case we emphasize the relation between an argument and a value rather than the computation necessary to convert an argument to a value.)

To this day λ\lambda-calculus is very popular with programming language researchers. This is for two reasons. First, λ\lambda-calculus is Turing complete---any computation that can be done in a “real” language can be done in λ\lambda-calculus. Second, the language is very small, allowing us to focus on the features we want to study without being distracted by the mechanics of the language itself. It is sometimes called “the little white mouse” of programming language research.

The set of lambda terms Λ\Lambda consists of variables, functions, and function applications. Let MM and NN represent an arbitrary lambda terms and xx represent an arbitrary variable. Then we can define the syntax of Λ\Lambda as:

Λ::=xvariablesMNfunction applicationλx.Mfunctions, a.k.a. λ abstractions\begin{array}{lll} \Lambda ::= & x & \textrm{variables} \\ & M N & \textrm{function application} \\ & \lambda x. M & \textrm{functions, a.k.a. $\lambda$ abstractions} \\ \end{array}

A variable in our presentation of λ\lambda-calculus will be a symbol of one letter, possibly with “decorations” such as superscripts, subscripts, primes, etc. So, xx, y3y^3, xnx_n, aa', z^\hat z are all examples of valid variable names. a3a3, foofoo, and barbar are not valid, because their names are more than one symbol long.

A function application is denoted by juxtaposition. We will usually not separate terms with spaces, and instead “run them together.” This works because all variable names will be one letter long.

Here are some example function applications: fxfx, a(mn)a (mn), fxyfxy The first example is ff applied to xx. The second example applies mm to nn, and then applies aa to that result. The third example applies ff to two variables xx and yy. Function application associates to the left, so abcdabcd is the same as ((ab)c)d((ab)c)d.

A function is denoted by the Greek letter λ\lambda (pronounced LAM-duh), followed by a parameter name(s), then a period, and then the body of the function. The usual convention is that the body of the function extends as far as it can go. Functions are also called λ\lambda-abstractions or just abstractions.

Here are some example functions:

λx.x\lambda x. xThe identity function.
λa.λb.a\lambda a . \lambda b . aTakes two arguments and returns the first
λf.λx.f(fx)\lambda f . \lambda x . f(fx)Takes two arguments and applies the first one to the second one twice.

It is common to “stack up” parameters of adjacent λ\lambda‘s. So

λa.λb.a\lambda a . \lambda b . a\equivλab.a\lambda a b.a
λf.λx.f(fx)\lambda f . \lambda x . f (f x)\equivλfx.f(fx)\lambda f x.f (f x)

In longer expressions you may need parentheses to be sure that they are interpreted correctly.

λx.xλy.y\lambda x. x \lambda y.yThis is a function that applies xx to the identity.
(λx.x)λy.y(\lambda x. x) \lambda y.yThis is the identity function applied to the identity function.


Which of the following are valid λ\lambda expressions?

1λx.xyz\lambda x.xyz
2λx.λy.\lambda x.\lambda y.
4xλwy.yx\lambda wy.y
6λλxz.zx\lambda \lambda xz.zx
7(abcd)(efgh)ijλklm.mkl(abcd)(efgh)ij\lambda klm.mkl


Validλx.xyz\lambda x.xyz
Not Validλx.λy.\lambda x.\lambda y.
Validxλwy.yx\lambda wy.y
Not Validxλx\lambda
Not Validλλxz.zx\lambda \lambda xz.zx
Valid(abcd)(efgh)ijλklm.mkl(abcd)(efgh)ij\lambda klm.mkl

Free and Bound Variables

When a variable is listed between the λ\lambda and the dot, i.e., it is a parameter of a function, then we say that the variable is \emph{bound} to that particular λ\lambda. It is possible for there to be more than one λ\lambda-binding a variable with the same name. These are counted as two separate variables. \sidenote{Many authors forbid the use of the same variable name in more than one λ\lambda to avoid this confusion.} In such a case, a particular variable in an expression is bound by the nearest enclosing λ\lambda. If a variable is not bound by any λ\lambda we say that it is \emph{free}.

λx.x λy.y\lambda x. x\ \lambda y.yThe xx is bound to the first λ\lambda.
(λx.(λx.x)) x(\lambda x. (\lambda x.x))\ xThe first xx is bound to the second λ\lambda, and the second xx
is bound to the first λ\lambda.
λy.λz.x\lambda y. \lambda z.xThe variable xx is free.

A free variable will not be given a value by the enclosing expression. You can think of them as being “global.”


In the following expressions, first use parenthesis to make clear the extent of each λ\lambda abstraction, and then indicate to which λ\lambda the variable xx is bound.

For example, in λx.λz.x\lambda x.\lambda z.x, we could parenthesize it as (λx.(λz.x))(\lambda x.(\lambda z.x)), and the xx is bound to the first λ\lambda.

λx.λy.x\lambda x. \lambda y . x
λx.λx.x\lambda x. \lambda x . x
λx.xλy.x\lambda x. x \lambda y . x
λx.xλx.x\lambda x. x \lambda x . x
λz.xλy.x\lambda z. x \lambda y . x
λz.xλx.x\lambda z. x \lambda x . x


(λx.(λy.x))(\lambda x. (\lambda y . x))The first λ\lambda.
(λx.(λx.x))(\lambda x. (\lambda x . x))The second λ\lambda.
(λx.x(λy.x))(\lambda x. x (\lambda y . x))Both are bound to the first λ\lambda.
(λx.x(λx.x))(\lambda x. x (\lambda x . x))The first xx is bound to the first λ\lambda, the second xx is bound to the second λ\lambda.
(λz.x(λy.x))(\lambda z. x (\lambda y . x))Both xx’s are free.
(λz.x(λx.x))(\lambda z. x (\lambda x . x))The first xx is free but the second is bound to the second λ\lambda.

Evaluating Expressions using β\beta-reductions

There is only one operation in λ\lambda-calculus: β\beta-reduction (sometimes spelled as \emph{beta-reduction}). A β\beta-reduction occurs when you have a λ\lambda abstraction applied to another λ\lambda term. For example, (λx.x)y(\lambda x.x)y is reducible to yy. The terms (λx.xy)(\lambda x.xy) and xλy.yx\lambda y.y are not reducible. The first because there is only an abstraction, but nothing after it; the second because the xx is not an abstraction.

Be careful when the λ\lambda abstraction and the term following it are both arguments to another term. In the term x(λy.y)zx (\lambda y.y) z, the (λy.y)(\lambda y.y) and zz are both arguments to xx. The λy.y\lambda y.y is not being applied to zz.

Here’s how to perform a β\beta-reduction. Remove the initial λ\lambda and its parameter. Then, in the body of that function, replace all occurrences of that variable with the argument. So, any variable that was bound the that lambda gets replaced. Mathematically, we would say

[(\lambda x.M) N \rightarrow [N/x]M]

Where [N/x]M[N/x]M mean “replace all xx’s in MM that were bound to the lambdalambda by NN.” It might help to think of using an editor to do a search and replace.

Another notation you might see is this one:

[(\lambda x.M) N \rightarrow M[x:=N]]

It has the same meaning.

Here are a few examples of β\beta reductions. Sometimes one reduction is followed by another one.

\begin{tabular}{l} (λx.x)yy(\lambda x.x)y \rightarrow y \ (λz.x)yx(\lambda z.x)y \rightarrow x \ (λz.azbz)yayby(\lambda z.azbz)y \rightarrow ayby \ (λx.(λz.z)x)y(λz.z)yy(\lambda x.(\lambda z.z)x)y \rightarrow (\lambda z.z)y \rightarrow y \ (λx.x(λz.ax)(λx.bx))yy(λz.ay)(λx.bx)(\lambda x.x(\lambda z.ax)(\lambda x.bx))y \rightarrow y (\lambda z.ay)(\lambda x.bx) \ (λx.(λz.zx)(λx.bx))y(λz.zy)(λx.bx)(λx.bx)yby(\lambda x.(\lambda z.zx)(\lambda x.bx))y \rightarrow (\lambda z.zy)(\lambda x.bx) \rightarrow (\lambda x.bx)y \rightarrow by\ \end{tabular}

Try doing these reductions. Reduce each expression as much as you can.

| (λx.xx)y(\lambda x.x x)y | | (λx.axxa)y(\lambda x.axxa)y | | (λx.(λz.zx)q)y(\lambda x.(\lambda z.zx)q)y | | (λx.x((λz.zx)(λx.bx)))y(\lambda x.x((\lambda z.zx)(\lambda x.bx)))y | | (λa.a)(λb.b)(λc.cc)(λd.d)(\lambda a.a) (\lambda b.b) (\lambda c.c c) (\lambda d.d) |


| (λx.xx)yyy(\lambda x.x x)y \rightarrow yy | | (λx.axxa)yayya(\lambda x.axxa)y \rightarrow ayya | | (λx.(λz.zx)q)y(λz.zy)qqy(\lambda x.(\lambda z.zx)q)y \rightarrow (\lambda z.zy)q \rightarrow qy | | (λx.x((λz.zx)(λx.bx)))yy((λz.zy)(λx.bx))y((λx.bx)y)y(by)(\lambda x.x((\lambda z.zx)(\lambda x.bx)))y \rightarrow y((\lambda z.zy)(\lambda x.bx)) \rightarrow y((\lambda x.bx)y) \rightarrow y(by) | | (λa.a)(λb.b)(λc.cc)(λd.d)(λb.b)(λc.cc)(λd.d)(λc.cc)(λd.d)(λd.d)(λd.d)(λd.d)(\lambda a.a) (\lambda b.b) (\lambda c.c c) (\lambda d.d) | | \rightarrow (\lambda b.b) (\lambda c.c c) (\lambda d.d) | | \rightarrow (\lambda c.c c) (\lambda d.d) | | \rightarrow (\lambda d.d)(\lambda d.d) | | \rightarrow (\lambda d.d) |

A note about reduction order

You might be wondering what to do if you have a choice about which application to make. Consider this example:

(λx.x((λy.y)x))((λa.a)(λb.b))(\lambda x . x ((\lambda y .y) x)) ((\lambda a.a)(\lambda b.b))

We will be using \emph{normal order reduction}, which says that the leftmost, outermost reduction is performed first. In this case, the λx\lambda x is applied first.

Another option is applicative order. In that system, the leftmost, innermost reduction is performed first. This would be the λa\lambda a reduction. Essentially, this is “call by value,” as the arguments to a function are evaluated before the function is called.

For now, we are not going to emphasize the different reduction orders. Use normal order reduction for everything, i.e., don’t reduce the arguments before calling the function.

Finally, consider this example:

(λx.x((λy.y)x))(\lambda x . x ((\lambda y .y) x))

Do we do the λy\lambda y reduction or not? In other words, do we do computation inside of a function before the function has been called? Most languages say no.'' (This is known as *weak head normal form*, but we are not going to talk about that in this course.) If we say yes,” then we get λx.xx\lambda x.xx, and say that the result is in normal form. This is the style we will use.

Alpha Capture and Renaming

Consider the following two λ\lambda terms:

X(λa.λb.ab)andY(λw.λx.wx) X \equiv (\lambda a.\lambda b.ab) \qquad \textrm{and} Y \equiv (\lambda w.\lambda x.wx)

Clearly (we hope!) terms XX and YY do not differ at all in meaning. The names of the variables are different, but what these functions actually do when you apply them is identical. When two terms have the same structure and differ only in the names of the variables, they are said to be α\alpha-equivalent. If we rename the variables in a term in such a way that the structure is preserved, is is called α\alpha-renaming.

Here is an example of two terms that are not α\alpha-equivalent.

X(λa.λb.ab)andY(λw.λx.xw) X \equiv (\lambda a. \lambda b. ab) \qquad \textrm{and} Y \equiv (\lambda w. \lambda x. xw)

Notice how the second term YY applies its arguments in the reverse order compared to XX. The structure of the terms are different, so they are not α\alpha-equivalent.

Which of the following pairs are α\alpha-equivalent?

λa.λb.abbλb.λa.baaλa.λb.abbλi.λj.jjiλx.xλy.xλe.eλf.fλx.xλy.xλe.eλf.e \begin{array}{ll} \lambda a. \lambda b. a b b & \lambda b. \lambda a. b a a \\ \lambda a. \lambda b. a b b & \lambda i. \lambda j. j j i \\ \lambda x. x \lambda y. x & \lambda e. e \lambda f. f \\ \lambda x. x \lambda y. x & \lambda e. e \lambda f. e \\ \end{array}


Yesλa.λb.abbλb.λa.baaNoλa.λb.abbλi.λj.jjiNoλx.xλy.xλe.eλf.fYesλx.xλy.xλe.eλf.e \begin{array}{lll} \textrm{Yes} & \lambda a. \lambda b. a b b & \lambda b. \lambda a. b a a \\ \textrm{No} & \lambda a. \lambda b. a b b & \lambda i. \lambda j. j j i \\ \textrm{No} & \lambda x. x \lambda y. x & \lambda e. e \lambda f. f \\ \textrm{Yes} & \lambda x. x \lambda y. x & \lambda e. e \lambda f. e \\ \end{array}

Sometimes it happens that a free variable or a variable that is bound to one λ\lambda ends up being moved around the expression in such a way that it gets “captured” by another λ\lambda. This is known as α\alpha-capture, and is almost always a bad thing. Here are some examples.

(λx.λy.yx)yλy.yy (\lambda x .\lambda y. y x) y \rightarrow \lambda y. y y

The free variable yy has been captured by the λy\lambda y. Contrast this example, where we have α\alpha-renamed the λy\lambda y term to λz\lambda z:

(λx.λz.zx)yλz.zy(\lambda x .\lambda z. z x) y \rightarrow \lambda z. z y

In general, if you are performing a reduction and find that you are going to capture a free variable, you need to α\alpha-rename the capturing lambda. You cannot rename the free variable!

You can see that the non-capturing version has a different structure than the capturing version.

It is easy to write examples that cause capture when you use free variables, but you can do it with only bound variables too, if you reduce to normal form.

(λf.λx.fx)(λy.λx.y)λx.(λy.λx.y)xλx.λx.x (\lambda f . \lambda x . f x) (\lambda y.\lambda x.y) \rightarrow \lambda x. (\lambda y.\lambda x. y) x \rightarrow \lambda x. \lambda x.x

We can prevent this by α\alpha-renaming the second term.

(λf.λx.fx)(λy.λz.y)λx.(λy.λz.y)xλx.λz.x (\lambda f . \lambda x . f x) (\lambda y.\lambda z.y) \rightarrow \lambda x. (\lambda y.\lambda z. y) x \rightarrow \lambda x. \lambda z.x

For this reason, it is best that you always use distinct variable names for your lambdas.

Church Numerals

The λ\lambda-calculus doesn’t have numbers, but we can model them using functions. We need to know what a number actually is in order to model it. For our model, we will say that a number nn can be thought of as a potential: someday we are going to do something nn times. If we are going to do something, we also need something to do it to.

Therefore, a numeral1 in λ\lambda calculus will be a function that takes two arguments, an action and a target, and performs the action a certain number of times to the target. These are called Church Numerals, after Alonso Church.