Using Pattern Matching to Build a Small Lazy Language in Racket

Article Written by David Graunke

A few months ago I was looking for a new side project, and I ended up building a very small compiler and runtime for a lazily-evaluated language.

This system compiles to SKI combinators and then executes them via graph reduction, using both the compilation and execution technique from David Turner’s 1984 paper on the subject.

Before compiling to combinators, the little Scheme-like language is compiled to lambda calculus expressions using the system described in this excellent blog post about compiling to lambdas by Matt Might. The techniques in that post gave me the courage to try to build the SKI compiler, and if you’re interested in this stuff, I really recommend reading the whole thing.

The system uses Racket pattern-matching throughout. I don’t think I would’ve been able to stick through this without that — pattern matching makes it super simple to translate one structure to another. Thanks to the expressiveness of the pattern matching rule, the entire system is less than 50 lines.

The system can run programs like this (this factorial function is also from the Matt Might post!):

(λ (x)
     (letrec [(fac (λ (y) 
                     (cond (= y 0)
                           1
                           (mult y (fac (minus y 1))))))]
       (fac x))))

which calculates factorials, and

(let ([cons (λ (head tail) (λ (f) (f head tail)))]
      [car  (λ (cell) (cell (λ (head tail) head)))]
      [cdr  (λ (cell) (cell (λ (head tail) tail)))])
      (letrec ([count-up (λ (n) (cons n (count-up (plus n 1))))])
        (car (cdr (cdr (cdr (cdr (count-up 0))))))))

which defines a purely-functional linked list interface, and creates an infinite lazy list.

It can also handle partial application:

(let ([add-one (plus 1)])
              (add-one 2)))

The input language is a lazy subset of Scheme. It uses S-Expressions, Scheme lambda syntax, and Scheme let-binding forms. Unlike Scheme, programs in this language execute lazily — which means we can define infinite sequences the same way we defined linked lists — and automatically allows for automatic partial application of functions. Lazy-scheme to Lambdas

From a lazy Lisp to Lambdas

The compilation of lambda expressions here uses a subset of Matt Might’s compilation rules from his Compiling up to the Lambda Calculus post. Some important differences: these rules don’t implement Church encodings for numbers and booleans/conditionals; instead, those are handled at combinator-evaluation time, following Turner’s 1984 paper.

The first step is to convert input programs into the lambda calculus. We can start by looking at what our lazy-scheme has that lambda calculus doesn’t: it has let and letrec bindings and multi-argument functions. We need to translate those forms into equivalent lambda expressions.

To remove let bindings, we’re going to turn them into bound variables of a function. For example, the expression:

(let ([x (+ 2  2)])
    (+ x 5))

is turned into:

((λ (x)(+ x 5)) (+ 2 2))

We’re going to do this using Racket’s pattern matcher. We’ll call this function ‘desugar’, since it’s projecting expressions into a subset of the language (limited to anonymous functions). Our desugar function with our first rule looks like this:

(define (desugar expr)
  (match expr
    [`(let ((,v ,exp)) ,body) (desugar `((λ (,v) ,body) ,exp))]
    [_ expr]))

Racket’s ‘match’ macro takes a list of rules with a pattern on the left-hand side, and a rearrangement on the right-hand side. It takes the input (called expr here), and tries to match to a structure in the left-hand side of a rule. In our case, if we pass in it would start by checking it against the first (and for now only) rule. First it would check if it’s a list, because the left-hand side is a list. Then it would check if the first element is the literal ‘let’ — which it is. Then it would look for another list that has a two-item list as its first element, followed by some expression, corresponding to ((,v ,exp)) ,body). It then would bind the variables v, exp, and body to the corresponding elements in the input expression. Then, following the right-hand side, it will rearrange them into the form ((λ (,v) ,body) ,exp)) and call desugar on the result.

We also have a fall-through rule, [_ expr]. The pattern ‘_’ matches anything, and we use it to return the input element if we can’t match it against any of the previous rules.

This rule only matches let bindings with one binding, like (let ([x (+ 2  2)]) …). We can add another rule to match let bindings with multiple bindings. Now our desugar function looks like this:

(define (desugar expr)
  (match expr
    ; let-bindings
    [`(let ((,v ,exp)) ,body) (desugar `((λ (,v) ,body) ,exp))]
    [`(let ((,v ,exp) ,rest ...) ,body) (desugar `((λ (,v) (let ,rest ,body)) ,exp))]
    [_ expr]))

We also need to add a special form for bindings that refer to themselves—letrec.

(define (desugar expr)
  (match expr
    ; let-bindings
    [`(let ((,v ,exp)) ,body) (desugar `((λ (,v) ,body) ,exp))]
    [`(let ((,v ,exp) ,rest ...) ,body) (desugar `((λ (,v) (let ,rest ,body)) ,exp))]
    
    ; add letrec
    [`(letrec [(,f ,lam)] ,body)  (desugar `(let ((,f (Y (λ (,f) ,lam)))) ,body))]

    [_ expr]))

This introduces the Y combinator. The Y combinator can be hard to wrap your head around, but all you need to know here is that it allows for recursive functions in the lambda calculus.

Next, we need to get rid of multi-argument functions. We take a function f like:

(λ (x y) (+ x y))

and turn it into:

(λ (x)
   (λ (y)
      (+ x y)))

Then, rather than calling it like (f x y), you call it like ((f x) y). This transformation is called currying. To do this, we match function definitions with multiple arguments:

(define (desugar expr)
  (match expr

    ; curry multi-arg lambdas
    [`(λ (,v ,vs ...) ,exp) `(λ (,v) ,(desugar `(λ (,@vs) ,exp)))]

    ; let-bindings
    [`(let ((,v ,exp)) ,body) (desugar `((λ (,v) ,body) ,exp))]
    [`(let ((,v ,exp) ,rest ...) ,body) (desugar `((λ (,v) (let ,rest ,body)) ,exp))]
    [`(letrec [(,f ,lam)] ,body)  (desugar `(let ((,f (Y (λ (,f) ,lam)))) ,body))]

    [_ expr]))

We also need transform calls like (f x y) into ((f x) y). We’ll put this rule at the end, because we don’t want it to accidently treat a let as an f:

(define (desugar expr)
  (match expr

    ; curry multi-arg lambdas
    [`(λ (,v ,vs ...) ,exp) `(λ (,v) ,(desugar `(λ (,@vs) ,exp)))]
    

    ; let-bindings
    [`(let ((,v ,exp)) ,body) (desugar `((λ (,v) ,body) ,exp))]
    [`(let ((,v ,exp) ,rest ...) ,body) (desugar `((λ (,v) (let ,rest ,body)) ,exp))]
    [`(letrec [(,f ,lam)] ,body)  (desugar `(let ((,f (Y (λ (,f) ,lam)))) ,body))]

    ; curry multi-arg function calls
    [`(,f ,exp ,rest ...) (desugar `((,f ,exp) ,@rest))]

    [_ expr]))

Next, we need to add two patterns for traversing the lambda expressions to recursively apply the desugaring function.

(define (desugar expr)
  (match expr
    
    ; compile the body of lambda expressions
    [`(λ (,x) ,body) `(λ (,x) ,(desugar body))]

    ; curry multi-arg lambdas
    [`(λ (,v ,vs ...) ,exp) `(λ (,v) ,(desugar `(λ (,@vs) ,exp)))]

    ; let-bindings
    [`(let ((,v ,exp)) ,body) (desugar `((λ (,v) ,body) ,exp))]
    [`(let ((,v ,exp) ,rest ...) ,body) (desugar `((λ (,v) (let ,rest ,body)) ,exp))]
    [`(letrec [(,f ,lam)] ,body)  (desugar `(let ((,f (Y (λ (,f) ,lam)))) ,body))]

    ; compile the left and right-hand sides of application expressions
    [`(,f ,exp) `(,(desugar f) ,(desugar exp))]

    ; curry multi-arg function calls
    [`(,f ,exp ,rest ...) (desugar `((,f ,exp) ,@rest))]

    [_ expr]))

That’s the complete desugar function!

We can apply it to one of the example functions above:

(desugar   '(λ (x)
               (letrec [(fac (λ (y) 
                              (cond (= y 0)
                               1
                               (mult y (fac (minus y 1))))))]
              (fac x)))

produces:

'(λ (x)
   ((λ (fac) (fac x))
    (Y (λ (fac) (λ (y) (((cond ((= y) 0)) 1) ((mult y) (fac ((minus y) 1)))))))))

Compiling lambdas to combinators

Now we have a method for turning our higher-level language into lambdas. We could write an interpreter for lambdas directly — using something like eval/apply, or straight beta reduction. This implementation uses a slightly different technique — graph reduction via combinators.

We translate the lambda calculus expressions into combinator expressions using three combinators, S, K, and I. A combinator is a function that recombines its arguments in some way. Our three combinators are defined as follows:

    S x y z = (x z) (y z)
    K x y   =  x
  I x     =  x

These functions are curried just like the lambda expressions: to call S, we would write (((S x) y) z). If we call just ((S x) y), we end up with partially applied function. The result is a function of one parameter, z. When we provide the z, the combinator computes   ((x z) (y z)).

Unlike lambda calculus, expressions in the combinator calculus don’t have any bound variables. In fact, you can think of the translation from lambdas to combinators as the process of eliminating bound variables.

As an example, here’s the factorial function above compiled to S, K, and I combinators:

'((S ((S ((S (K S)) (K I))) ((S (K K)) I)))
  ((S (K Y))
   ((S
     ((S (K S))
      ((S ((S (K S)) ((S (K K)) (K S))))
       ((S
         ((S (K S))
          ((S ((S (K S)) ((S (K K)) (K S))))
           ((S
             ((S (K S))
              ((S ((S (K S)) ((S (K K)) (K S))))
               ((S ((S (K S)) ((S (K K)) (K K)))) ((S (K K)) (K cond))))))
            ((S
              ((S (K S))
               ((S ((S (K S)) ((S (K K)) (K S))))
                ((S
                  ((S (K S))
                   ((S ((S (K S)) ((S (K K)) (K S))))
                    ((S ((S (K S)) ((S (K K)) (K K)))) ((S (K K)) (K =))))))
                 ((S (K K)) (K I))))))
             ((S ((S (K S)) ((S (K K)) (K K)))) ((S (K K)) (K 0))))))))
        ((S ((S (K S)) ((S (K K)) (K K)))) ((S (K K)) (K 1)))))))
    ((S
      ((S (K S))
       ((S ((S (K S)) ((S (K K)) (K S))))
        ((S
          ((S (K S))
           ((S ((S (K S)) ((S (K K)) (K S))))
            ((S ((S (K S)) ((S (K K)) (K K)))) ((S (K K)) (K mult))))))
         ((S (K K)) (K I))))))
     ((S
       ((S (K S))
        ((S ((S (K S)) ((S (K K)) (K S))))
         ((S ((S (K S)) ((S (K K)) (K K)))) (K I)))))
      ((S
        ((S (K S))
         ((S ((S (K S)) ((S (K K)) (K S))))
          ((S
            ((S (K S))
             ((S ((S (K S)) ((S (K K)) (K S))))
              ((S ((S (K S)) ((S (K K)) (K K)))) ((S (K K)) (K minus))))))
           ((S (K K)) (K I))))))
       ((S ((S (K S)) ((S (K K)) (K K)))) ((S (K K)) (K 1)))))))))

Totally readable, right? You can see, though, that there are no bound variables — no function definitions introducing new bindings. There are just S, K, and I, along with numbers like 1 arithmetic operators like mult and minus.

One thing that should be obvious — translations to SKI combinators blows up the size of the expression. In fact, the length of the SKI expression grows quadratically to length of the lambda input. For that reason S, K, and I aren’t a particularly popular combinator scheme for real-world graph reducers — other collections of combinators have better performance — but they’re among the simplest, which is the motivating factor for this project.

We can map lambdas to S, K, and I combinators with the following rules:

    (λ (x) x)       = I
    (λ (x) y)       = (K y)
  (λ (x) (e1 e2)) = (S (λ (x) e1) 
                       (λ (x) e2))

That’s it!  By applying those rules recursively through the tree of lambda expressions, you convert everything to S, K, I expressions, eliminating bound variables in the process. The function to do this looks a lot like lazy-scheme to lambdas function:  

(define (lambda-to-combinators expr)
  (match expr
    
    ; traverse
    [`(λ (,x) (λ (,y) ,z)) (lambda-to-combinators `(λ (,x)
                          ,(lambda-to-combinators `(λ (,y) ,z))))]
    
    ; eta reduction
    [`(λ (,x) (,y . ,x) ) #:when (not (equal? x y)) (lambda-to-combinators y)]
    
    ; λ x. e1 e2 = S (λ x. e1) (λ x. e2)
    [`(λ (,x) (,e1 . ,e2)) `((S . ,(lambda-to-combinators `(λ (,x) ,e1))) .
                                  ,(lambda-to-combinators `(λ (,x) ,e2)))]

    ; λ x. x = I
    [`(λ (,x) ,x) 'I]

    ; λ x. y = K y
    [`(λ (,x) ,y) `(K . ,(lambda-to-combinators y))]

    ; traverse
    [`(,e1 . ,e2) `(,(lambda-to-combinators e1) . ,(lambda-to-combinators e2))]

    ; return unmatched terms
    [_ expr]))

That’s it! Between this and the desugar function above, we can turn any expression in our lazy scheme language into S, K, I combinators. That means we can go from a relatively high-level language with variable bindings, recursion, and multiparameter functions, and convert them to programs restricted to S, K, and I, plus arithmetic and control-flow operators, with no bound variables.

That means our interpreter for the resulting expressions can be very simple. There’s no need to handle environments, stack frames, capture-avoiding substitution, or any of the other features of alternative implementations.

The graph reducer

The interpreter for the S, K, I expressions uses term-rewriting graph reduction. It operates as a pattern matcher: when it sees structures like (((S x) y) z), it rewrites them as ((x z) (y z)). The implementation follows David Turner’s approach from his 1984 paper that popularized SKI graph reduction.

Here’s it’s useful to think of the concrete implementation of the SKI expression data structures. We’re using scheme dotted pairs, meaning each expression is a two element linked-cell (and because we’re using dotted pairs, the last element of the list is just a value, rather than nil).

The K and I combinators operate similarly. The graph reducer walks the tree of combinator expressions and continually performs the rewrites defined by S, K, and I. Additionally, it has logic for handling addition, subtraction, multiplication, and conditionals.

I won’t get too deep in the actual execution of the graph reducer. But it uses a stack to keep track of the expressions it encounters while it walks down the left-hand terms of the graph tree. It only walks the left-hand because functions that need to be applied only appear in the left-hand position. Sometimes terms will start in a right position, like x in (((S x) y) z) but be moved to the left-hand position by the application of S.

It also recognizes special functions for arithmetic and control flow. Unlike every other expression, the operands to these functions are executed eagerly so that concrete values can be provided the functions.

With the graph reducer, we can think of the S,K,I expressions as machine code for a virtual graph reducing machine. For example, we could right a graph reducer in Javascript, and use S,K,I programs generated by the racket code.

(define (reduce [spine '()])
  (match spine
    [(list-rest (cons 'S x) (cons _ y) (cons _ z) tail) (cons (cons (cons x z) (cons y z)) tail)]
    [(list-rest (cons 'K x) (cons _ y) tail)            (cons x tail)]
    [(list-rest (cons 'I x) tail)                       (cons x tail)]
    
    [(list-rest (cons 'Y x) tail)                    (cons (cons x (cons 'Y x)) tail)]
    
    [(list-rest (cons 'cond #t) (cons _ x) (cons _ y) w) (cons x w)]
    [(list-rest (cons 'cond #f) (cons _ x) (cons _ y) w) (cons y w)]
    
    [(list-rest (cons 'cond x) tail) (cons (cons 'cond (evaluate x)) tail)]
    
    [(list-rest (cons 'plus x) (cons _ y) w)  (cons (+ (evaluate x) (evaluate y)) w)]
    [(list-rest (cons 'minus x) (cons _ y) w) (cons (- (evaluate x) (evaluate y)) w)]
    [(list-rest (cons 'mult x) (cons _ y) w)  (cons (* (evaluate x) (evaluate y)) w)]
    
    [(list-rest (cons '= x) (cons _ y) w) (cons (= (evaluate x) (evaluate y)) w)]
    
    [(list-rest (cons 'print x) y) (begin (print (evaluate x)) (cons x y))]

    
    [(list-rest (? symbol? x) (cons _ y) z) (cons (cons x y) z)]
    [(list-rest (cons (? symbol? x) _) _) spine]
    
    [(list (cons head _) _ ...) (cons head spine)]
    [_ spine]))

(define (evaluate-spine spine)
  (let ([reduction (reduce spine)])
    (if (eq? reduction spine)
        spine
        (evaluate-spine reduction))))

(define (evaluate expr)
  (car (evaluate-spine (list expr))))

One thing to note: real-world FP systems today are not built like this. Other combinator sets besides S, K, and I generate much smaller intermediate representations, and the graph reduction technique from Turner’s 1984 paper that I implement has been surpassed by much more efficient systems.

You can read a little about why in this Lambda the Ultimate thread. Even so, I picked a SKI-combinator-based system for it’s simplicity, and the ability to work from a small number of references.