Continuation Passing Style

Continuations: rest of the program²²The term Continuations is credited to Wadsworth and Strachey.

Wikipedia describes continuations as

… an abstract representation of the control state of a computer program.

That’s a mouthful. There’s a simpler, more intuitive, definition:³³ Strachey, C. and Wadsworth, C.P. 1974. Continuations: A mathematical semantics for handling full jumps. (no title). (1974).

[A continuation is] [t]he (meaning of the) “rest of the program” …

Let’s say, our program is as follows:

1: x = "hello world!"
2: print(x)

Then, the continuation at the point where x is defined, is: lambda x: print(x). After x gets defined, the Python interpreter continues with the rest of the program. In other words, it continues with … 🥁 … the continuation.

For a given point in the execution of a program, the corresponding continuation is whatever will happen from that point to the end of the corresponding program’s “main” function.

One thing to note is that continuations can be represented by functions. As functions they take as input the current state of the program⁴⁴In the code example, this is simply x. and produce as output the output of the program.

Another thing to note is that continuations of a program do not compose.⁵⁵Composable continuations, aka delimited continuations, do exist and are distinct. Intuitively, they represent the meaning of the “rest of the program” up to a point (i.e., they are delimited). Why is that? Continuations extend till the end of the program. If they could be composed with something, that something would have to be after the end of the program and as such wouldn’t be a continuation of that program.

Continuation Passing Style

Having defined continuations, continuation passing style (CPS) is a program transformation that transforms functions such that they accept an extra argument which represents the continuation at the call-site.

We begin our exploration with the following recursive function. For the purposes of this blog post we will ignore what might happen when subtractorial is evaluated with a sufficiently large integer as input.

from __future__ import annotations

def subtractorial(n: int) -> int:
  match n:
    case 0:
      return 0
    case _:
      return (n - subtractorial(n-1))

With a little algebra it can be demonstrated that subtractorial(x)⁶⁶For a non-negative number. is equivalent to \(\lceil \frac{x}{2} \rceil\). We can observe this below:

for x in range(10):
  print(f"{x} => {subtractorial(x)}")

0 => 0
1 => 1
2 => 1
3 => 2
4 => 2
5 => 3
6 => 3
7 => 4
8 => 4
9 => 5

The CPS transformation of subtractorial is subtractorial1, which is defined as:

1: from typing import Callable, TypeVar
2: 
3: R = TypeVar('R')
4: 
5: def subtractorial1(n: int, c: Callable[[int], R]) -> R:
6:   return c(subtractorial(n))

Our previously defined subtractorial is what you get when you supply the identity function⁷⁷lambda x: x as the continuation:

print(subtractorial1(11, lambda x: x))

CPS as a tail-recursive program transformation

Using the above definitions of subtractorial1 and subtractorial we can synthesize an equivalent definition of subtractorial1 that no longer depends on subtractorial.

 1: from __future__ import annotations
 2: from typing     import Callable, TypeVar
 3: 
 4: R = TypeVar('R')
 5: 
 6: def subtractorial2(n: int, c: Callable[[int], R]) -> R:
 7:   match n:
 8:     case 0:
 9:       return c(0)
10:     case _:
11:       return subtractorial2((n - 1), c(lambda x: n - x))

CPS is implemented by having the last call in a function be the call to the continuation. And since subtractorial2 is recursive and in CPS, it is tail-recursive. Additionally, this form of subtractorial2 make clear that the continuation is acting as an accumulating parameter. During the tail-call, the continuation being passed to the recursive call is the composition of (i.e., the accumulation of) c and lambda x: n - x.

For completeness, an equivalent formulation for the tail-call is:

return subtractorial2((n - 1), lambda x: c(n - x))

CPS and defunctionalization

Functions aren’t the only representation for continuations. Specifically, CPS can be defunctionalized. Defunctionalization⁸⁸ Reynolds, J.C. 1972. Definitional interpreters for higher-order programming languages. Proceedings of the acm annual conference-volume 2 (1972), 717–740. proceeds by choosing an appropriate representation to represent and distinguish all the higher-order calls in a program.

All the continuations in subtractorial2 are of the form int -> int.⁹⁹I.e., taking a single int as an input and producing an int. Even more specifically, the continuations are a result of composing functions of the form lambda x: n - x for some integer n. One way to represent compositions of such functions is as a list of integers. The abstraction function subtabs3 converts from the “concrete” representation (list of integers) to the previous “abstract” (functional) representation.

1: from __future__ import annotations
2: from typing     import Callable
3: from functools  import reduce
4: 
5: def subtabs3(xs: [int]) -> Callable[[int], int]:
6:   return reduce(lambda x, y: lambda z: x(y - z), xs, lambda x: x)

Note the similarity between lambda z: x(y - z) in subtabs3 and lambda x: c(n - x) in subtractorial.

Using the above we can now define a new implementation that is both tail-recursive and first-order.

 7: def subtractorial3(x: int) -> int:
 8:   return _subtractorial3(x, [])
 9: 
10: def _subtractorial3(x: int, xs: [int]) -> int:
11:   match x:
12:     case 0:
13:       return subtabs3(xs)(0)
14:     case _:
15:       xs.append(x)
16:       return _subtractorial3((x - 1), xs)

And we can confirm that it computes the same results as subtractorial above.

for x in range(10):
  print(f"{x} => {subtractorial3(x)}")

0 => 0
1 => 1
2 => 1
3 => 2
4 => 2
5 => 3
6 => 3
7 => 4
8 => 4
9 => 5

The significance of the choice of representation

A list of integers isn’t the only way to represent the continuations in subtractorial2. Another way is to represent the composition of functions as a single integer.

1: from __future__ import annotations
2: from typing     import Callable
3: from functools  import reduce
4: 
5: def subtabs4(x: int) -> Callable[[int], int]:
6:   return lambda y: abs(x - y)

If, starting with the initial argument to subtractorial4, we club every pair of integers we observe that every pair contributes 1¹⁰¹⁰x - (x - 1) == 1 to the final result. Thus, the key is to pick a representation that focuses on the absolute value of the difference, rather than the difference itself. Having done so, the definition of subtractorial4 is straightforward.

 7: def subtractorial4(x: int) -> int:
 8:   return _subtractorial4(x, 0)
 9: 
10: def _subtractorial4(x: int, y: int) -> int:
11:   match x:
12:     case 0:
13:       return subtabs4(y)(0)
14:     case _:
15:       return _subtractorial4((x - 1), (x - y))

Having defined subtractorial4 as above, we can indeed confirm that it computes the same results as subtractorial and subtractorial3 above.

for x in range(10):
  print(f"{x} => {subtractorial4(x)}")

0 => 0
1 => 1
2 => 1
3 => 2
4 => 2
5 => 3
6 => 3
7 => 4
8 => 4
9 => 5

As an optimization aid

Let’s consider the example of the singly-linked lists and the naive implementation of reverse¹¹¹¹The departure to Haskell for this section is a didactic choice. Haskell lists have less magic in their implementation than Python’s list. which reverses the list:

The reverse of an empty list is the empty list.
The reverse of a list with a head and a tail is the result of appending the head to the reverse of the tail.

reverse :: [a] -> [a]
reverse []    = []
reverse (h:t) = (reverse t) ++ [h]

This naive implementation is quadratic in the length of the list since append (++) is linear in the length of its left argument.

For the uninitiated, here’s a brief glossary of Haskell syntax:

:: denotes the type of an expression.
x::[a] states that the type of the variable x is that of [a].¹²¹²I.e., a list comprised of elements whose type is a, for some type a.
a -> b is the type of a function that takes a as input and produces a result of type b.
a -> b -> c is equivalent¹³¹³More accurately, it is the curried form of such a function. to a function that takes two arguments, one of type a and the other of type b, and produces a result of type c.
(h:t) deconstructs a list into a head element and the remainder of the list, i.e., the tail.

Returning to the example, we can CPS-transform reverse and get reverse2 in a manner similar to how we obtained subtractorial2 from subtractorial. Eliding the intermediate steps, we get:

reverse2 :: [a] -> [a]
reverse2 xs = reverse2' xs id where
  reverse2' :: [a] -> ([a] -> [a]) -> [a]
  reverse2' []    c = c []
  reverse2' (h:t) c = reverse2' t (\xs -> c (xs ++ [h]))

In reverse2, id stands for the identity function.

We can continue in the spirit of subtractorial3 and defunctionalize reverse2. As the continuations accumulate they form a chain of function compositions where each function appends a single element. If we¹⁴¹⁴How we choose a representation appropriately is out of scope for this post. represent this composition (++ [a]) . (++ [b]) ... (++ [c]) as the list [c, ..., b, a] of appendees in reverse order we obtain:

revabs3 :: [a] -> ([a] -> [a])
revabs3     xs  =  \ys -> (ys ++ xs)

reverse3 :: [a] -> [a]
reverse3    xs   = reverse3' xs [] where
  reverse3' :: [a] -> [a] -> [a]
  reverse3' []      c = revabs3 c []
  reverse3' (h : t) c = reverse3' t (h : c)

Inlining the definition of revabs3, we obtain the well-known, linear-time reverse function using a list accumulator.

reverse4 :: [a] -> [a]
reverse4    xs   = reverse4' xs [] where
  reverse4' :: [a] -> [a] -> [a]
  reverse4' []      c = c
  reverse4' (h : t) c = reverse4' t (h : c)

Let’s reflect for a moment. The transformation to CPS made explicit the context surrounding the recursive call(s). Performing defunctionalization forced us to consider the question of what would be an appropriate representation that can adequately address the, possibly many, recursive calls. Having picked an appropriate representation our function went from quadratic-time to linear-time.

CPS and Evaluation order

There is more than one evaluation strategy that an interpreter or compiler can follow when evaluating expressions.

As noted by Reynolds¹⁵¹⁵ Reynolds, J.C. 1998. Definitional interpreters revisited. Higher-order and symbolic computation. 11, (1998), 355–361. a program in CPS has the property that

Whenever some function calls a serious function¹⁶¹⁶A serious function is defined as a function that may not terminate (depending on the arguments passed to it)., the calling function must return the same result as the called function, without performing any further computation.

In other words, as we’ve seen in this blog post, CPS results in tail-calls which, at a particular point, explicitly represent what the meaning of the “rest of the program” is. The continuations in these tail-calls stack or nest,¹⁷¹⁷Not unlike Matryoshka dolls. with the order of nesting representing the order in which computations are to be evaluated.

This provides an additional “degree of freedom” when defining interpreters which allow sufficient flexibility¹⁸¹⁸As it turns out, CPS isn’t necessary for order-of-application independence. such that the order of evaluation of the defined language can be made independent of the order of evaluation of the defining language.⁸⁸ Reynolds, J.C. 1972. Definitional interpreters for higher-order programming languages. Proceedings of the acm annual conference-volume 2 (1972), 717–740. ^,¹⁹¹⁹At least for Call by name and Call by value evaluation orders.

And that is all we’ll say on this specific topic. Not because the topic isn’t worth elaborating, but because it’d be difficult to compete with Reynolds excellent treatment of it.¹⁵¹⁵ Reynolds, J.C. 1998. Definitional interpreters revisited. Higher-order and symbolic computation. 11, (1998), 355–361. ^,²⁰²⁰ Reynolds, J.C. 1998. Definitional interpreters for higher-order programming languages. Higher-order and symbolic computation. 11, (1998), 363–397.

Historical tidbit

Continuations and continuation passing style have been (re-)discovered multiple times since the 1960s.¹¹ Reynolds, J.C. 1993. The discoveries of continuations. Lisp and symbolic computation. 6, (1993), 233–247. The earliest description of the use of continuations (specifically, CPS) was by Adriaan van Wijngaarden in September 1964, at an IFIP Working Conference on Formal Language Description Languages held in Baden bei Wien, Austria.

Participants in the discussion of his presentation included Dijkstra, Hoare, McCarthy, McIlroy, and Strachey, and other conference attendees included Böhm, Elgot, Landin and Nivat.

van Wijngaarden used CPS transformation (before it was called as such) to compile a program with goto statements into another that didn’t need any goto statements.

Under the inspiration of the notion of the unnecessity of goto’s, Dijkstra spent the evening constructing realistic examples of programs without goto’s, which he scribbled on napkins at coffee break the next day. … That coffee-break palaver ripened into the most celebrated²¹²¹ So celebrated, in fact, that we have not one, but two posts on the topic. letter to the editor in the history of computing.²²²² Dijkstra, E.W. 1968. Letters to the editor: go to statement considered harmful. Communications of the acm. 11, 3 (1968), 147–148.

Conclusion

Continuations reify the meaning of the rest of the program. Continuation passing style transforms a program such that each function takes an additional function which is a representation of the continuation at the call-site. When these continuations are represented as functions, every function in the CPS-transformed program makes a tail-call to said continuation.²³²³Passing as an argument the output of the original function. CPS and defunctionalization complement each other. Appropriate choice of defunctionalized representations can lead to asymptotic improvements. And finally, because continuations make explicit “what to do after this step”, CPS can be used to achieve order-of-application independence in defining and defined languages.

That’s it. That’s the idea.

Comments

Comments can be left on twitter, mastodon, as well as below, so have at it.

New post!

What is the Continuation Passing Style of programming?https://t.co/KBp0IlBJsP

Reply here if you have comments.
— The Weary Travelers blog (@wearyTravlrsBlg) July 9, 2023

To view the Giscus comment thread, enable Giscus and GitHub’s JavaScript or navigate to the specific discussion on Github.

Continuations: rest of the program22The term Continuations is credited to Wadsworth and Strachey.

Continuation Passing Style

CPS as a tail-recursive program transformation

CPS and defunctionalization

The significance of the choice of representation

As an optimization aid

CPS and Evaluation order

Historical tidbit

Conclusion

Comments

Footnotes:

Continuations: rest of the program²²The term Continuations is credited to Wadsworth and Strachey.