The Weary Travelers

A blog for computer scientists


Date: 2023-06-03
Author: Suhail
Suhail.png

The challenge in writing extensible, modular code

Many software libraries are written in a way that makes it difficult for the user to extend them and easily verify their correctness. This is a pity, because their features may be underutilized as a consequence. Following some simple guidelines can dramatically improve your library’s ease of extensibility. Your libraries will be used and extended more and your set of potential collaborators will, in turn, grow.

Writing software libraries that allow the user to easily extend them1I.e., with code reuse and not by reimplementation. is challenging. Doing so while also making it easier for the user to verify correctness by maintaining static guarantees is harder still. However, it is possible to accomplish both simultaneously. By the end of the post you will understand some approaches that don’t quite work and an approach that does.2It’s not the only approach that does; we are not considering an exhaustive set of approaches.

All code examples were interpreted on this Python version:

3.11.3 | packaged by conda-forge | (main, Apr  6 2023, 08:57:19) [GCC 11.3.0]

And were type-checked with:

mypy 1.3.0 (compiled: yes)

Let’s dig in!

Introduction

Generally speaking, software libraries introduce terms as well as provide some interpretations3I.e., the behaviour that the terms are intended to denote. for those terms. For instance, a very rudimentary calculator may introduce terms corresponding to:

  • Integer literals
  • Addition
  • Negation

And may provide a function that allows the user to calculate the result of an expression involving the above terms. Such a library written in the Object-oriented way might look something like this:

 1: from abc import ABC, abstractmethod
 2: 
 3: class AbelianInt(ABC):
 4:     @abstractmethod
 5:     def calc(self) -> int: pass
 6: class Lit(AbelianInt):
 7:     def __init__(self, x: int):
 8:         self.x = x
 9:     def calc(self):
10:         return self.x
11: class Neg(AbelianInt):
12:     def __init__(self, x: AbelianInt):
13:         self.x = x
14:     def calc(self):
15:         return -self.x.calc()
16: class Add(AbelianInt):
17:     def __init__(self, x: AbelianInt, y: AbelianInt):
18:         self.x = x
19:         self.y = y
20:     def calc(self):
21:         return self.x.calc() + self.y.calc()

What happens when the user tries to extend the above library? While the user is able to add a term to, say, include multiplication, they are unable to4In a manner whose correctness can be verified by a type checker. add an interpretation that allows them to “pretty-print”5A possible application for pretty-printing might be for debugging and logging. an expression in addition to being able to calculate the value of an expression. The above library isn’t very extensible.

We will consider a few different approaches for structuring code and contrast the relative ease with which new terms can be added vs new interpretations. Finally, we’ll present one approach which allows us to do both simultaneously while also allowing the user to enforce static guarantees.6By way of a static type checker.

The Expression Problem

While Philip Wadler coined the term “The expression problem” in 19987 The Expression Problem by Philip Wadler. it was observed at least as far back as 1975 by John C. Reynolds.8 User-defined Types and Procedural Data Structures as complementary approaches to Data Abstraction
John C. Reynolds. IFIP Working Group 2.1 on Algol (1975).
In the words of Wadler:

The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).

[…]

One can think of cases as rows and functions as columns in a table. In a functional language, the rows are fixed (cases in a datatype declaration) but it is easy to add new columns (functions). In an object-oriented language, the columns are fixed (methods in a class declaration) but it is easy to add new rows (subclasses). We want to make it easy to add either rows or columns.

Let’s study the example from before a little more closely.

OOP inheritance approach

In the OOP approach we:

  • denote terms via classes that inherit from a base class
  • denote interpretations via instance methods

Let’s see what this looks like.

 1: from abc import ABC, abstractmethod
 2: 
 3: class AbelianInt(ABC):
 4:     @abstractmethod
 5:     def calc(self) -> int: pass
 6: class Lit(AbelianInt):
 7:     def __init__(self, x: int):
 8:         self.x = x
 9:     def calc(self):
10:         return self.x
11: class Neg(AbelianInt):
12:     def __init__(self, x: AbelianInt):
13:         self.x = x
14:     def calc(self):
15:         return -self.x.calc()
16: class Add(AbelianInt):
17:     def __init__(self, x: AbelianInt, y: AbelianInt):
18:         self.x = x
19:         self.y = y
20:     def calc(self):
21:         return self.x.calc() + self.y.calc()

And it type-checks:

$> mypy ./lib.py
Success: no issues found in 1 source file

✅ Use the library

 1: from lib import Lit, Neg, Add
 2: 
 3: prg = Add(Neg(Lit(1)), Lit(2))
 4: 
 5: def main() -> None:
 6:     verify(prg.calc(), 1)
 7: 
 8: def verify(x, y, tag = None) -> None:
 9:     preamble = f" {tag}::" if tag else ""
10:     if x == y:
11:         print(f"SUCCESS:{preamble} {x} == {y}")
12:     else:
13:         print(f"FAIL:{preamble} {x} != {y}")
14: 
15: if __name__ == "__main__":
16:     main()
SUCCESS: 1 == 1

And it type-checks:

$> mypy ./use.py
Success: no issues found in 1 source file

✅ Extend the library: add a term

Let’s now try to extend by adding a term to represent multiplication.

 1: import lib
 2: from lib import Lit, Neg, Add
 3: 
 4: class Mul(lib.AbelianInt):
 5:     def __init__(self, x: lib.AbelianInt, y: lib.AbelianInt):
 6:         self.x = x
 7:         self.y = y
 8:     def calc(self):
 9:         return self.x.calc() * self.y.calc()
10: 
11: def main() -> None:
12:     import use
13:     prg1 = Mul(Lit(1), use.prg)
14:     verify(prg1.calc(), 1, "prg1")
15:     prg2 = Add(Lit(1), prg1)
16:     verify(prg2.calc(), 2, "prg2")
17: 
18: def verify(x, y, tag = None) -> None:
19:     preamble = f" {tag}::" if tag else ""
20:     if x == y:
21:         print(f"SUCCESS:{preamble} {x} == {y}")
22:     else:
23:         print(f"FAIL:{preamble} {x} != {y}")
24: 
25: if __name__ == "__main__":
26:     main()
SUCCESS: prg1:: 1 == 1
SUCCESS: prg2:: 2 == 2

And it type-checks:

$> mypy ./extend_terms.py
Success: no issues found in 1 source file

❌ Extend the library: add an interpretation

In statically typed OOP languages, this isn’t generally possible. However, Python allows us to “monkey patch” our classes after they have been defined. Let’s see what happens when we try to do that.

 1: from lib import Lit, Neg, Add
 2: 
 3: def lit_pprint(self):
 4:     return f"{self.x}"
 5: Lit.pprint = lit_pprint
 6: def neg_pprint(self):
 7:     return f"(-{self.x.pprint()})"
 8: Neg.pprint = neg_pprint
 9: def add_pprint(self):
10:     return f"({self.x.pprint()} + {self.y.pprint()})"
11: Add.pprint = add_pprint
12: 
13: def main() -> None:
14:     import use
15:     return print(use.prg.pprint())
16: 
17: if __name__ == "__main__":
18:     main()

We are able to successfully evaluate our code:

((-1) + 2)

But our code no longer type-checks:

$> mypy ./extend_interpretations.py
extend_interpretations.py:5: error: "Type[Lit]" has no attribute "pprint"  [attr-defined]
extend_interpretations.py:8: error: "Type[Neg]" has no attribute "pprint"  [attr-defined]
extend_interpretations.py:11: error: "Type[Add]" has no attribute "pprint"  [attr-defined]
extend_interpretations.py:15: error: "Add" has no attribute "pprint"  [attr-defined]
Found 4 errors in 1 file (checked 1 source file)

Some non-solutions

OOP composition

What if we were to no longer require that the classes that represent terms inherit from a base class, and instead were to use composition?

 1: from typing_extensions import Protocol
 2: 
 3: class IntCalcInterface(Protocol):
 4:     def calc(self) -> int: ...
 5: 
 6: class CalcInt:
 7:     @classmethod
 8:     def calc(cls, x: int) -> int:
 9:         return x
10: 
11: class Lit:
12:     calculator = CalcInt
13:     def __init__(self, x: int):
14:         self.x = x
15:     def calc(self):
16:         return self.calculator.calc(self.x)
17: class Neg:
18:     calculator = CalcInt
19:     def __init__(self, x: IntCalcInterface):
20:         self.x = x
21:     def calc(self):
22:         return self.calculator.calc(-self.x.calc())
23: class Add:
24:     calculator = CalcInt
25:     def __init__(self, x: IntCalcInterface, y: IntCalcInterface):
26:         self.x = x
27:         self.y = y
28:     def calc(self):
29:         return self.calculator.calc(self.x.calc() + self.y.calc())

We are able to define the library in a type-safe manner:

$> mypy ./lib.py
Success: no issues found in 1 source file

✅ Use the library

The usage is the same as in the case for OOP inheritance.

 1: from lib import Lit, Neg, Add
 2: 
 3: prg = Add(Neg(Lit(1)), Lit(2))
 4: 
 5: def main() -> None:
 6:     verify(prg.calc(), 1)
 7: 
 8: def verify(x, y, tag = None) -> None:
 9:     preamble = f" {tag}::" if tag else ""
10:     if x == y:
11:         print(f"SUCCESS:{preamble} {x} == {y}")
12:     else:
13:         print(f"FAIL:{preamble} {x} != {y}")
14: 
15: if __name__ == "__main__":
16:     main()

We are able to use it as expected:

SUCCESS: 1 == 1

And our usage type-checks:

$> mypy ./use.py
Success: no issues found in 1 source file

✅ Extend the library: add a term

We are also able to add a term representing multiplication as before.

 1: import lib
 2: from lib import Lit, Neg, Add
 3: 
 4: class Mul:
 5:     calculator = lib.CalcInt
 6:     def __init__(self, x: lib.IntCalcInterface, y: lib.IntCalcInterface):
 7:         self.x = x
 8:         self.y = y
 9:     def calc(self):
10:         return self.calculator.calc(self.x.calc() * self.y.calc())
11: 
12: def main() -> None:
13:     import use
14:     prg1 = Mul(Lit(1), use.prg)
15:     verify(prg1.calc(), 1, "prg1")
16:     prg2 = Add(Lit(1), prg1)
17:     verify(prg2.calc(), 2, "prg2")
18: 
19: def verify(x, y, tag = None) -> None:
20:     preamble = f" {tag}::" if tag else ""
21:     if x == y:
22:         print(f"SUCCESS:{preamble} {x} == {y}")
23:     else:
24:         print(f"FAIL:{preamble} {x} != {y}")
25: 
26: if __name__ == "__main__":
27:     main()
SUCCESS: prg1:: 1 == 1
SUCCESS: prg2:: 2 == 2

And it type-checks:

$> mypy ./extend_terms.py
Success: no issues found in 1 source file

❌ Extend the library: add an interpretation

Let’s see what happens when we try to add an interpretation corresponding to “pretty printing”.9Because of the lack of ABCMeta.register support in mypy we use Protocols.

 1: from typing_extensions import Protocol
 2: from lib import Lit, Neg, Add
 3: 
 4: class PPrintInterface(Protocol):
 5:     def pprint(self) -> str: ...
 6: 
 7: def lit_pprint(self) -> str:
 8:     return f"{self.x}"
 9: Lit.pprint = lit_pprint
10: 
11: def neg_pprint(self) -> str:
12:     return f"(-{self.x.pprint()})"
13: Neg.pprint = neg_pprint
14: 
15: def add_pprint(self) -> str:
16:     return f"({self.x.pprint()} + {self.y.pprint()})"
17: Add.pprint = add_pprint
18: 
19: def main() -> None:
20:     import use
21:     prg : PPrintInterface
22:     prg = use.prg
23:     return print(prg.pprint())
24: 
25: if __name__ == "__main__":
26:     main()

We are able to successfully evaluate our code as before:

((-1) + 2)

However, the code still doesn’t type-check. Composition doesn’t help us here.

$> mypy ./extend_interpretations.py
extend_interpretations.py:9: error: "Type[Lit]" has no attribute "pprint"  [attr-defined]
extend_interpretations.py:13: error: "Type[Neg]" has no attribute "pprint"  [attr-defined]
extend_interpretations.py:17: error: "Type[Add]" has no attribute "pprint"  [attr-defined]
extend_interpretations.py:22: error: Incompatible types in assignment (expression has type "Add", variable has type "PPrintInterface")  [assignment]
Found 4 errors in 1 file (checked 1 source file)

FP approach10Aka, the “initial” approach.

In the FP “initial” approach we:

Let’s see what this looks like.

 1: from __future__ import annotations
 2: from typing import assert_never
 3: from dataclasses import dataclass
 4: 
 5: @dataclass
 6: class Lit:
 7:     x: int
 8: @dataclass
 9: class Add:
10:     x: AbelianInt
11:     y: AbelianInt
12: @dataclass
13: class Neg:
14:     x: AbelianInt
15: 
16: AbelianInt = Lit | Add | Neg
17: 
18: def calc(prog: AbelianInt) -> int:
19:     match prog:
20:         case Lit(x):
21:             return x
22:         case Add(x, y):
23:             return calc(x) + calc(y)
24:         case Neg(x):
25:             return -calc(x)
26:         case _ as unreachable:
27:             assert_never(unreachable)

And it type-checks:

$> mypy ./lib.py
Success: no issues found in 1 source file

✅ Use the library

 1: from lib import Lit, Neg, Add, calc
 2: 
 3: prg = Add(Neg(Lit(1)), Lit(2))
 4: 
 5: def main() -> None:
 6:     verify(calc(prg), 1)
 7: 
 8: def verify(x, y, tag = None) -> None:
 9:     preamble = f" {tag}::" if tag else ""
10:     if x == y:
11:         print(f"SUCCESS:{preamble} {x} == {y}")
12:     else:
13:         print(f"FAIL:{preamble} {x} != {y}")
14: 
15: if __name__ == "__main__":
16:     main()

We are able to calculate results as expected:

SUCCESS: 1 == 1

And our usage type-checks:

$> mypy ./use.py
Success: no issues found in 1 source file

❌ Extend the library: add a term

Let’s see what happens when we try to add a term representing multiplication.

 1: from __future__ import annotations
 2: from dataclasses import dataclass
 3: import lib
 4: from lib import Lit, Neg, Add
 5: 
 6: @dataclass
 7: class Mul:
 8:     x: RingInt
 9:     y: RingInt
10: 
11: RingInt = Mul | lib.AbelianInt
12: 
13: def calc(prog: RingInt) -> int:
14:     match prog:
15:         case Mul(x, y):
16:             return calc(x) * calc(y)
17:         case _:
18:             return lib.calc(prog)

The code type-checks:

$> mypy ./extend_terms.py
Success: no issues found in 1 source file

Let’s try to mix and match the newly added term with others.

 1: import use
 2: from lib import Lit, Add
 3: from extend_terms import Mul, calc
 4: 
 5: prg1 = Mul(Lit(1), use.prg)
 6: prg2 = Add(Lit(1), prg1)
 7: 
 8: def main() -> None:
 9:     verify(calc(prg1), 1, "prg1") # works
10:     try:
11:         verify(calc(prg2), 2, "prg2") # fails
12:     except Exception as e:
13:         print(f"FAIL: prg2:: Exception {e}")
14: 
15: def verify(x, y, tag = None) -> None:
16:     preamble = f" {tag}::" if tag else ""
17:     if x == y:
18:         print(f"SUCCESS:{preamble} {x} == {y}")
19:     else:
20:         print(f"FAIL:{preamble} {x} != {y}")
21: 
22: if __name__ == "__main__":
23:     main()

We only have a limited freedom in being able to inter-operate with previously defined expressions:

SUCCESS: prg1:: 1 == 1
FAIL: prg2:: Exception Expected code to be unreachable, but got: Mul(x=Lit(x=1), y=Add(x=Neg(x=Lit(x=1)), y=Lit(x=2)))

On the bright side, the runtime failure is also caught during type-checking.

$> mypy ./extend_terms_use.py
extend_terms_use.py:6: error: Argument 2 to "Add" has incompatible type "Mul"; expected "Union[Lit, Add, Neg]"  [arg-type]
Found 1 error in 1 file (checked 1 source file)

✅ Extend the library: add an interpretation

We are, however, able to add an interpretation corresponding to “pretty printing” with ease.

 1: from __future__ import annotations
 2: from typing import assert_never
 3: 
 4: import lib
 5: 
 6: def pprint(prog: lib.AbelianInt) -> str:
 7:     match prog:
 8:         case lib.Lit(x):
 9:             return f"{x}"
10:         case lib.Add(x, y):
11:             return f"({pprint(x)} + {pprint(y)})"
12:         case lib.Neg(x):
13:             return f"(-{pprint(x)})"
14:         case _ as unreachable:
15:             assert_never(unreachable)        

Our definition type-checks:

$> mypy ./extend_interpretations.py
Success: no issues found in 1 source file

Let’s try and use this interpretation to “pretty print” a previously defined expression.

1: import use
2: from extend_interpretations import pprint
3: 
4: def main() -> None:
5:     return print(pprint(use.prg))
6: 
7: if __name__ == "__main__":
8:     main()

The result is as expected:

((-1) + 2)

And our usage type-checks as well:

$> mypy ./extend_interpretations_use.py
Success: no issues found in 1 source file

A solution

As we observed, neither the OOP approach nor the FP “initial” approach allow for sufficient extensibility while still conforming to statically-verifiable guarantees. Below we consider one approach (the so-called “tagless-final” approach)11Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages
Carette, J., Kiselyov, O., Shan, C.c. Journal of Functional Programming 2009.
, 12Equivalently, the Object algebras approach. that does prove to be sufficient for our needs.

Tagless-final / Object algebras approach

In the Tagless-final approach we:

  • denote terms by functions of a typeclass
  • denote interpretations by typeclass instances

Let’s see what this looks like in Python.

 1: from __future__ import annotations
 2: from typing import TypeVar
 3: from typing_extensions import Protocol
 4: 
 5: R = TypeVar('R')
 6: 
 7: class AbelianInt(Protocol[R]):
 8:     @classmethod
 9:     def lit(cls, x: int) -> R: ...
10:     @classmethod
11:     def add(cls, x: R, y: R) -> R: ...
12:     @classmethod
13:     def neg(cls, x: R) -> R: ...
14: 
15: class AbelianIntCalc(AbelianInt[int]):
16:     @classmethod
17:     def lit(cls, x: int):
18:         return x
19:     @classmethod
20:     def add(cls, x: int, y: int):
21:         return (x + y)
22:     @classmethod
23:     def neg(cls, x: int):
24:         return (-x)
25: 
26: def calc(x: int) -> int:
27:     return x

The definition type-checks:

Success: no issues found in 1 source file

Above, functions within AbelianInt represent the terms and AbelianIntCalc represents the “calc” interpretation. The function calc itself can be obviated13In languages where typeclass instances are automatically selected (e.g., Haskell) functions such as calc can help the programmer guide the compiler to select the correct instance. since it’s just the identity.

✅ Use the library

Let’s try and use a library written in the tagless-final manner to write and calculate some expressions.

 1: import lib
 2: from lib import AbelianIntCalc, calc
 3: 
 4: def prg(x: lib.AbelianInt[lib.R]) -> lib.R:
 5:     return x.add(x.neg(x.lit(1)), x.lit(2))
 6: 
 7: def main() -> None:
 8:     verify(calc(prg(AbelianIntCalc)), 1)
 9: 
10: def verify(x, y, tag = None) -> None:
11:     preamble = f" {tag}::" if tag else ""
12:     if x == y:
13:         print(f"SUCCESS:{preamble} {x} == {y}")
14:     else:
15:         print(f"FAIL:{preamble} {x} != {y}")
16: 
17: if __name__ == "__main__":
18:     main()

It produces the expected result:

SUCCESS: 1 == 1

And our usage type-checks:

$> mypy ./use.py
Success: no issues found in 1 source file

✅ Extend the library: add a term

Let’s extend by adding a term to represent multiplication.

 1: from typing_extensions import Protocol
 2: from lib import AbelianInt, R, AbelianIntCalc
 3: 
 4: class RingInt(AbelianInt[R], Protocol[R]):
 5:     @classmethod
 6:     def mul(cls, x: R, y: R) -> R: ...
 7: 
 8: class RingIntCalc(RingInt[int], AbelianIntCalc):
 9:     @classmethod
10:     def mul(cls, x: int, y: int):
11:         return (x * y)

Our definition type-checks:

$> mypy ./extend_terms.py
Success: no issues found in 1 source file

Let’s mix and match the newly added term with others.

 1: import use
 2: from lib import R, calc
 3: from extend_terms import RingInt, RingIntCalc
 4: 
 5: def prg1(x: RingInt[R]) -> R:
 6:     return x.mul(x.lit(1), use.prg(x))
 7: def prg2(x: RingInt[R]) -> R:
 8:     return x.add(x.lit(1), prg1(x))
 9: 
10: def main() -> None:
11:     verify(calc(prg1(RingIntCalc)), 1, "prg1")
12:     verify(calc(prg2(RingIntCalc)), 2, "prg2")
13: 
14: def verify(x, y, tag = None) -> None:
15:     preamble = f" {tag}::" if tag else ""
16:     if x == y:
17:         print(f"SUCCESS:{preamble} {x} == {y}")
18:     else:
19:         print(f"FAIL:{preamble} {x} != {y}")
20: 
21: if __name__ == "__main__":
22:     main()

We are able to evaluate both expressions successfully:

SUCCESS: prg1:: 1 == 1
SUCCESS: prg2:: 2 == 2

And it also successfully type-checks:

$> mypy ./extend_terms_use.py
Success: no issues found in 1 source file

✅ Extend the library: add an interpretation

Let’s add an interpretation corresponding to “pretty printing”.

 1: import lib
 2: 
 3: class AbelianIntPrint(lib.AbelianInt[str]):
 4:     @classmethod
 5:     def lit(cls, x: int):
 6:         return f"{x}"
 7:     @classmethod
 8:     def add(cls, x: str, y: str):
 9:         return f"({x} + {y})"
10:     @classmethod
11:     def neg(cls, x: str):
12:         return f"(-{x})"
13: 
14: def pprint(x: str) -> str:
15:     return x

Our definition type-checks:

$> mypy ./extend_interpretations.py
Success: no issues found in 1 source file

Let’s use this interpretation to “pretty print” a previously defined expression.

1: from extend_interpretations import AbelianIntPrint, pprint
2: import use
3: 
4: def main() -> None:
5:     return print(pprint(use.prg(AbelianIntPrint)))
6: 
7: if __name__ == "__main__":
8:     main()

It evaluates successfully:

((-1) + 2)

And our use also type-checks:

$> mypy ./extend_interpretations_use.py
Success: no issues found in 1 source file

Conclusion

So how should we structure our code if we want to make our code easy for our users to extend? The steps are simple:

  1. Allow the user to describe the computation without necessarily evaluating it.14I.e., ensure that the term and interpretation distinction is retained.
  2. Map the terms (i.e., the description of the computation) 1-1 to typeclass functions.15Equivalently, static methods of an interface or protocol class in OOP.
  3. Map the interpretation(s) 1-1 to typeclass instances.16Equivalently, protocol implementations in OOP.

Following these three simple steps we blur the distinction between library authors and library users and make it easier for a library user to become a library contributor.17 Step 4: Profit.

That’s it. That’s the idea.

Comments

Comments can be left on twitter, mastodon, as well as below, so have at it.

To view the Giscus comment thread, enable Giscus and GitHub’s JavaScript or navigate to the specific discussion on Github.

Footnotes:

1

I.e., with code reuse and not by reimplementation.

2

It’s not the only approach that does; we are not considering an exhaustive set of approaches.

3

I.e., the behaviour that the terms are intended to denote.

4

In a manner whose correctness can be verified by a type checker.

5

A possible application for pretty-printing might be for debugging and logging.

6

By way of a static type checker.

11

Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages
Carette, J., Kiselyov, O., Shan, C.c. Journal of Functional Programming 2009.

12

Equivalently, the Object algebras approach.

13

In languages where typeclass instances are automatically selected (e.g., Haskell) functions such as calc can help the programmer guide the compiler to select the correct instance.

14

I.e., ensure that the term and interpretation distinction is retained.

15

Equivalently, static methods of an interface or protocol class in OOP.

16

Equivalently, protocol implementations in OOP.