The challenge in writing extensible, modular code
Many software libraries are written in a way that makes it difficult for the user to extend them and easily verify their correctness. This is a pity, because their features may be underutilized as a consequence. Following some simple guidelines can dramatically improve your library’s ease of extensibility. Your libraries will be used and extended more and your set of potential collaborators will, in turn, grow.
Writing software libraries that allow the user to easily extend them1I.e., with code reuse and not by reimplementation. is challenging. Doing so while also making it easier for the user to verify correctness by maintaining static guarantees is harder still. However, it is possible to accomplish both simultaneously. By the end of the post you will understand some approaches that don’t quite work and an approach that does.2It’s not the only approach that does; we are not considering an exhaustive set of approaches.
All code examples were interpreted on this Python version:
3.11.3 | packaged by conda-forge | (main, Apr 6 2023, 08:57:19) [GCC 11.3.0]
And were type-checked with:
mypy 1.3.0 (compiled: yes)
Let’s dig in!
Introduction
Generally speaking, software libraries introduce terms as well as provide some interpretations3I.e., the behaviour that the terms are intended to denote. for those terms. For instance, a very rudimentary calculator may introduce terms corresponding to:
- Integer literals
- Addition
- Negation
And may provide a function that allows the user to calculate the result of an expression involving the above terms. Such a library written in the Object-oriented way might look something like this:
1: from abc import ABC, abstractmethod 2: 3: class AbelianInt(ABC): 4: @abstractmethod 5: def calc(self) -> int: pass 6: class Lit(AbelianInt): 7: def __init__(self, x: int): 8: self.x = x 9: def calc(self): 10: return self.x 11: class Neg(AbelianInt): 12: def __init__(self, x: AbelianInt): 13: self.x = x 14: def calc(self): 15: return -self.x.calc() 16: class Add(AbelianInt): 17: def __init__(self, x: AbelianInt, y: AbelianInt): 18: self.x = x 19: self.y = y 20: def calc(self): 21: return self.x.calc() + self.y.calc()
What happens when the user tries to extend the above library? While the user is able to add a term to, say, include multiplication, they are unable to4In a manner whose correctness can be verified by a type checker. add an interpretation that allows them to “pretty-print”5A possible application for pretty-printing might be for debugging and logging. an expression in addition to being able to calculate the value of an expression. The above library isn’t very extensible.
We will consider a few different approaches for structuring code and contrast the relative ease with which new terms can be added vs new interpretations. Finally, we’ll present one approach which allows us to do both simultaneously while also allowing the user to enforce static guarantees.6By way of a static type checker.
The Expression Problem
While Philip Wadler
coined the term “The expression problem” in
19987
The Expression Problem by Philip Wadler. it was observed at least as far
back as 1975 by John
C. Reynolds.8
User-defined Types and Procedural Data Structures as complementary
approaches to Data Abstraction
John C. Reynolds. IFIP Working Group 2.1 on Algol (1975). In
the words of Wadler:
The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).
[…]
One can think of cases as rows and functions as columns in a table. In a functional language, the rows are fixed (cases in a datatype declaration) but it is easy to add new columns (functions). In an object-oriented language, the columns are fixed (methods in a class declaration) but it is easy to add new rows (subclasses). We want to make it easy to add either rows or columns.
Let’s study the example from before a little more closely.
OOP inheritance approach
In the OOP approach we:
- denote terms via classes that inherit from a base class
- denote interpretations via instance methods
Let’s see what this looks like.
1: from abc import ABC, abstractmethod 2: 3: class AbelianInt(ABC): 4: @abstractmethod 5: def calc(self) -> int: pass 6: class Lit(AbelianInt): 7: def __init__(self, x: int): 8: self.x = x 9: def calc(self): 10: return self.x 11: class Neg(AbelianInt): 12: def __init__(self, x: AbelianInt): 13: self.x = x 14: def calc(self): 15: return -self.x.calc() 16: class Add(AbelianInt): 17: def __init__(self, x: AbelianInt, y: AbelianInt): 18: self.x = x 19: self.y = y 20: def calc(self): 21: return self.x.calc() + self.y.calc()
And it type-checks:
$> mypy ./lib.py Success: no issues found in 1 source file
✅ Use the library
1: from lib import Lit, Neg, Add 2: 3: prg = Add(Neg(Lit(1)), Lit(2)) 4: 5: def main() -> None: 6: verify(prg.calc(), 1) 7: 8: def verify(x, y, tag = None) -> None: 9: preamble = f" {tag}::" if tag else "" 10: if x == y: 11: print(f"SUCCESS:{preamble} {x} == {y}") 12: else: 13: print(f"FAIL:{preamble} {x} != {y}") 14: 15: if __name__ == "__main__": 16: main()
SUCCESS: 1 == 1
And it type-checks:
$> mypy ./use.py Success: no issues found in 1 source file
✅ Extend the library: add a term
Let’s now try to extend by adding a term to represent multiplication.
1: import lib 2: from lib import Lit, Neg, Add 3: 4: class Mul(lib.AbelianInt): 5: def __init__(self, x: lib.AbelianInt, y: lib.AbelianInt): 6: self.x = x 7: self.y = y 8: def calc(self): 9: return self.x.calc() * self.y.calc() 10: 11: def main() -> None: 12: import use 13: prg1 = Mul(Lit(1), use.prg) 14: verify(prg1.calc(), 1, "prg1") 15: prg2 = Add(Lit(1), prg1) 16: verify(prg2.calc(), 2, "prg2") 17: 18: def verify(x, y, tag = None) -> None: 19: preamble = f" {tag}::" if tag else "" 20: if x == y: 21: print(f"SUCCESS:{preamble} {x} == {y}") 22: else: 23: print(f"FAIL:{preamble} {x} != {y}") 24: 25: if __name__ == "__main__": 26: main()
SUCCESS: prg1:: 1 == 1 SUCCESS: prg2:: 2 == 2
And it type-checks:
$> mypy ./extend_terms.py Success: no issues found in 1 source file
❌ Extend the library: add an interpretation
In statically typed OOP languages, this isn’t generally possible. However, Python allows us to “monkey patch” our classes after they have been defined. Let’s see what happens when we try to do that.
1: from lib import Lit, Neg, Add 2: 3: def lit_pprint(self): 4: return f"{self.x}" 5: Lit.pprint = lit_pprint 6: def neg_pprint(self): 7: return f"(-{self.x.pprint()})" 8: Neg.pprint = neg_pprint 9: def add_pprint(self): 10: return f"({self.x.pprint()} + {self.y.pprint()})" 11: Add.pprint = add_pprint 12: 13: def main() -> None: 14: import use 15: return print(use.prg.pprint()) 16: 17: if __name__ == "__main__": 18: main()
We are able to successfully evaluate our code:
((-1) + 2)
But our code no longer type-checks:
$> mypy ./extend_interpretations.py extend_interpretations.py:5: error: "Type[Lit]" has no attribute "pprint" [attr-defined] extend_interpretations.py:8: error: "Type[Neg]" has no attribute "pprint" [attr-defined] extend_interpretations.py:11: error: "Type[Add]" has no attribute "pprint" [attr-defined] extend_interpretations.py:15: error: "Add" has no attribute "pprint" [attr-defined] Found 4 errors in 1 file (checked 1 source file)
Some non-solutions
OOP composition
What if we were to no longer require that the classes that represent terms inherit from a base class, and instead were to use composition?
1: from typing_extensions import Protocol 2: 3: class IntCalcInterface(Protocol): 4: def calc(self) -> int: ... 5: 6: class CalcInt: 7: @classmethod 8: def calc(cls, x: int) -> int: 9: return x 10: 11: class Lit: 12: calculator = CalcInt 13: def __init__(self, x: int): 14: self.x = x 15: def calc(self): 16: return self.calculator.calc(self.x) 17: class Neg: 18: calculator = CalcInt 19: def __init__(self, x: IntCalcInterface): 20: self.x = x 21: def calc(self): 22: return self.calculator.calc(-self.x.calc()) 23: class Add: 24: calculator = CalcInt 25: def __init__(self, x: IntCalcInterface, y: IntCalcInterface): 26: self.x = x 27: self.y = y 28: def calc(self): 29: return self.calculator.calc(self.x.calc() + self.y.calc())
We are able to define the library in a type-safe manner:
$> mypy ./lib.py Success: no issues found in 1 source file
✅ Use the library
The usage is the same as in the case for OOP inheritance
.
1: from lib import Lit, Neg, Add 2: 3: prg = Add(Neg(Lit(1)), Lit(2)) 4: 5: def main() -> None: 6: verify(prg.calc(), 1) 7: 8: def verify(x, y, tag = None) -> None: 9: preamble = f" {tag}::" if tag else "" 10: if x == y: 11: print(f"SUCCESS:{preamble} {x} == {y}") 12: else: 13: print(f"FAIL:{preamble} {x} != {y}") 14: 15: if __name__ == "__main__": 16: main()
We are able to use it as expected:
SUCCESS: 1 == 1
And our usage type-checks:
$> mypy ./use.py Success: no issues found in 1 source file
✅ Extend the library: add a term
We are also able to add a term representing multiplication as before.
1: import lib 2: from lib import Lit, Neg, Add 3: 4: class Mul: 5: calculator = lib.CalcInt 6: def __init__(self, x: lib.IntCalcInterface, y: lib.IntCalcInterface): 7: self.x = x 8: self.y = y 9: def calc(self): 10: return self.calculator.calc(self.x.calc() * self.y.calc()) 11: 12: def main() -> None: 13: import use 14: prg1 = Mul(Lit(1), use.prg) 15: verify(prg1.calc(), 1, "prg1") 16: prg2 = Add(Lit(1), prg1) 17: verify(prg2.calc(), 2, "prg2") 18: 19: def verify(x, y, tag = None) -> None: 20: preamble = f" {tag}::" if tag else "" 21: if x == y: 22: print(f"SUCCESS:{preamble} {x} == {y}") 23: else: 24: print(f"FAIL:{preamble} {x} != {y}") 25: 26: if __name__ == "__main__": 27: main()
SUCCESS: prg1:: 1 == 1 SUCCESS: prg2:: 2 == 2
And it type-checks:
$> mypy ./extend_terms.py Success: no issues found in 1 source file
❌ Extend the library: add an interpretation
Let’s see what happens when we try to add an interpretation corresponding to
“pretty printing”.9Because of
the lack of
ABCMeta.register
support in mypy
we use
Protocols.
1: from typing_extensions import Protocol 2: from lib import Lit, Neg, Add 3: 4: class PPrintInterface(Protocol): 5: def pprint(self) -> str: ... 6: 7: def lit_pprint(self) -> str: 8: return f"{self.x}" 9: Lit.pprint = lit_pprint 10: 11: def neg_pprint(self) -> str: 12: return f"(-{self.x.pprint()})" 13: Neg.pprint = neg_pprint 14: 15: def add_pprint(self) -> str: 16: return f"({self.x.pprint()} + {self.y.pprint()})" 17: Add.pprint = add_pprint 18: 19: def main() -> None: 20: import use 21: prg : PPrintInterface 22: prg = use.prg 23: return print(prg.pprint()) 24: 25: if __name__ == "__main__": 26: main()
We are able to successfully evaluate our code as before:
((-1) + 2)
However, the code still doesn’t type-check. Composition doesn’t help us here.
$> mypy ./extend_interpretations.py extend_interpretations.py:9: error: "Type[Lit]" has no attribute "pprint" [attr-defined] extend_interpretations.py:13: error: "Type[Neg]" has no attribute "pprint" [attr-defined] extend_interpretations.py:17: error: "Type[Add]" has no attribute "pprint" [attr-defined] extend_interpretations.py:22: error: Incompatible types in assignment (expression has type "Add", variable has type "PPrintInterface") [assignment] Found 4 errors in 1 file (checked 1 source file)
FP approach10Aka, the “initial” approach.
In the FP “initial” approach we:
- denote terms by values of an algebraic data type
- denote interpretations by functions
Let’s see what this looks like.
1: from __future__ import annotations 2: from typing import assert_never 3: from dataclasses import dataclass 4: 5: @dataclass 6: class Lit: 7: x: int 8: @dataclass 9: class Add: 10: x: AbelianInt 11: y: AbelianInt 12: @dataclass 13: class Neg: 14: x: AbelianInt 15: 16: AbelianInt = Lit | Add | Neg 17: 18: def calc(prog: AbelianInt) -> int: 19: match prog: 20: case Lit(x): 21: return x 22: case Add(x, y): 23: return calc(x) + calc(y) 24: case Neg(x): 25: return -calc(x) 26: case _ as unreachable: 27: assert_never(unreachable)
And it type-checks:
$> mypy ./lib.py Success: no issues found in 1 source file
✅ Use the library
1: from lib import Lit, Neg, Add, calc 2: 3: prg = Add(Neg(Lit(1)), Lit(2)) 4: 5: def main() -> None: 6: verify(calc(prg), 1) 7: 8: def verify(x, y, tag = None) -> None: 9: preamble = f" {tag}::" if tag else "" 10: if x == y: 11: print(f"SUCCESS:{preamble} {x} == {y}") 12: else: 13: print(f"FAIL:{preamble} {x} != {y}") 14: 15: if __name__ == "__main__": 16: main()
We are able to calculate results as expected:
SUCCESS: 1 == 1
And our usage type-checks:
$> mypy ./use.py Success: no issues found in 1 source file
❌ Extend the library: add a term
Let’s see what happens when we try to add a term representing multiplication.
1: from __future__ import annotations 2: from dataclasses import dataclass 3: import lib 4: from lib import Lit, Neg, Add 5: 6: @dataclass 7: class Mul: 8: x: RingInt 9: y: RingInt 10: 11: RingInt = Mul | lib.AbelianInt 12: 13: def calc(prog: RingInt) -> int: 14: match prog: 15: case Mul(x, y): 16: return calc(x) * calc(y) 17: case _: 18: return lib.calc(prog)
The code type-checks:
$> mypy ./extend_terms.py Success: no issues found in 1 source file
Let’s try to mix and match the newly added term with others.
1: import use 2: from lib import Lit, Add 3: from extend_terms import Mul, calc 4: 5: prg1 = Mul(Lit(1), use.prg) 6: prg2 = Add(Lit(1), prg1) 7: 8: def main() -> None: 9: verify(calc(prg1), 1, "prg1") # works 10: try: 11: verify(calc(prg2), 2, "prg2") # fails 12: except Exception as e: 13: print(f"FAIL: prg2:: Exception {e}") 14: 15: def verify(x, y, tag = None) -> None: 16: preamble = f" {tag}::" if tag else "" 17: if x == y: 18: print(f"SUCCESS:{preamble} {x} == {y}") 19: else: 20: print(f"FAIL:{preamble} {x} != {y}") 21: 22: if __name__ == "__main__": 23: main()
We only have a limited freedom in being able to inter-operate with previously defined expressions:
SUCCESS: prg1:: 1 == 1 FAIL: prg2:: Exception Expected code to be unreachable, but got: Mul(x=Lit(x=1), y=Add(x=Neg(x=Lit(x=1)), y=Lit(x=2)))
On the bright side, the runtime failure is also caught during type-checking.
$> mypy ./extend_terms_use.py extend_terms_use.py:6: error: Argument 2 to "Add" has incompatible type "Mul"; expected "Union[Lit, Add, Neg]" [arg-type] Found 1 error in 1 file (checked 1 source file)
✅ Extend the library: add an interpretation
We are, however, able to add an interpretation corresponding to “pretty printing” with ease.
1: from __future__ import annotations 2: from typing import assert_never 3: 4: import lib 5: 6: def pprint(prog: lib.AbelianInt) -> str: 7: match prog: 8: case lib.Lit(x): 9: return f"{x}" 10: case lib.Add(x, y): 11: return f"({pprint(x)} + {pprint(y)})" 12: case lib.Neg(x): 13: return f"(-{pprint(x)})" 14: case _ as unreachable: 15: assert_never(unreachable)
Our definition type-checks:
$> mypy ./extend_interpretations.py Success: no issues found in 1 source file
Let’s try and use this interpretation to “pretty print” a previously defined expression.
1: import use 2: from extend_interpretations import pprint 3: 4: def main() -> None: 5: return print(pprint(use.prg)) 6: 7: if __name__ == "__main__": 8: main()
The result is as expected:
((-1) + 2)
And our usage type-checks as well:
$> mypy ./extend_interpretations_use.py Success: no issues found in 1 source file
A solution
As we observed, neither the OOP approach nor the FP “initial” approach allow for
sufficient extensibility while still conforming to statically-verifiable
guarantees. Below we consider one approach (the so-called “tagless-final”
approach)11Finally tagless, partially evaluated: Tagless staged interpreters
for simpler typed languages
Carette, J., Kiselyov, O., Shan, C.c. Journal of Functional Programming
2009., 12Equivalently, the
Object
algebras approach. that does prove to be sufficient for our needs.
Tagless-final / Object algebras approach
In the Tagless-final approach we:
- denote terms by functions of a typeclass
- denote interpretations by typeclass instances
Let’s see what this looks like in Python.
1: from __future__ import annotations 2: from typing import TypeVar 3: from typing_extensions import Protocol 4: 5: R = TypeVar('R') 6: 7: class AbelianInt(Protocol[R]): 8: @classmethod 9: def lit(cls, x: int) -> R: ... 10: @classmethod 11: def add(cls, x: R, y: R) -> R: ... 12: @classmethod 13: def neg(cls, x: R) -> R: ... 14: 15: class AbelianIntCalc(AbelianInt[int]): 16: @classmethod 17: def lit(cls, x: int): 18: return x 19: @classmethod 20: def add(cls, x: int, y: int): 21: return (x + y) 22: @classmethod 23: def neg(cls, x: int): 24: return (-x) 25: 26: def calc(x: int) -> int: 27: return x
The definition type-checks:
Success: no issues found in 1 source file
Above, functions within AbelianInt
represent the terms and AbelianIntCalc
represents the “calc” interpretation. The function calc
itself can be
obviated13In languages where typeclass instances are automatically selected
(e.g., Haskell) functions such as calc
can help the programmer guide the
compiler to select the correct instance. since it’s just the identity.
✅ Use the library
Let’s try and use a library written in the tagless-final manner to write and calculate some expressions.
1: import lib 2: from lib import AbelianIntCalc, calc 3: 4: def prg(x: lib.AbelianInt[lib.R]) -> lib.R: 5: return x.add(x.neg(x.lit(1)), x.lit(2)) 6: 7: def main() -> None: 8: verify(calc(prg(AbelianIntCalc)), 1) 9: 10: def verify(x, y, tag = None) -> None: 11: preamble = f" {tag}::" if tag else "" 12: if x == y: 13: print(f"SUCCESS:{preamble} {x} == {y}") 14: else: 15: print(f"FAIL:{preamble} {x} != {y}") 16: 17: if __name__ == "__main__": 18: main()
It produces the expected result:
SUCCESS: 1 == 1
And our usage type-checks:
$> mypy ./use.py Success: no issues found in 1 source file
✅ Extend the library: add a term
Let’s extend by adding a term to represent multiplication.
1: from typing_extensions import Protocol 2: from lib import AbelianInt, R, AbelianIntCalc 3: 4: class RingInt(AbelianInt[R], Protocol[R]): 5: @classmethod 6: def mul(cls, x: R, y: R) -> R: ... 7: 8: class RingIntCalc(RingInt[int], AbelianIntCalc): 9: @classmethod 10: def mul(cls, x: int, y: int): 11: return (x * y)
Our definition type-checks:
$> mypy ./extend_terms.py Success: no issues found in 1 source file
Let’s mix and match the newly added term with others.
1: import use 2: from lib import R, calc 3: from extend_terms import RingInt, RingIntCalc 4: 5: def prg1(x: RingInt[R]) -> R: 6: return x.mul(x.lit(1), use.prg(x)) 7: def prg2(x: RingInt[R]) -> R: 8: return x.add(x.lit(1), prg1(x)) 9: 10: def main() -> None: 11: verify(calc(prg1(RingIntCalc)), 1, "prg1") 12: verify(calc(prg2(RingIntCalc)), 2, "prg2") 13: 14: def verify(x, y, tag = None) -> None: 15: preamble = f" {tag}::" if tag else "" 16: if x == y: 17: print(f"SUCCESS:{preamble} {x} == {y}") 18: else: 19: print(f"FAIL:{preamble} {x} != {y}") 20: 21: if __name__ == "__main__": 22: main()
We are able to evaluate both expressions successfully:
SUCCESS: prg1:: 1 == 1 SUCCESS: prg2:: 2 == 2
And it also successfully type-checks:
$> mypy ./extend_terms_use.py Success: no issues found in 1 source file
✅ Extend the library: add an interpretation
Let’s add an interpretation corresponding to “pretty printing”.
1: import lib 2: 3: class AbelianIntPrint(lib.AbelianInt[str]): 4: @classmethod 5: def lit(cls, x: int): 6: return f"{x}" 7: @classmethod 8: def add(cls, x: str, y: str): 9: return f"({x} + {y})" 10: @classmethod 11: def neg(cls, x: str): 12: return f"(-{x})" 13: 14: def pprint(x: str) -> str: 15: return x
Our definition type-checks:
$> mypy ./extend_interpretations.py Success: no issues found in 1 source file
Let’s use this interpretation to “pretty print” a previously defined expression.
1: from extend_interpretations import AbelianIntPrint, pprint 2: import use 3: 4: def main() -> None: 5: return print(pprint(use.prg(AbelianIntPrint))) 6: 7: if __name__ == "__main__": 8: main()
It evaluates successfully:
((-1) + 2)
And our use also type-checks:
$> mypy ./extend_interpretations_use.py Success: no issues found in 1 source file
Conclusion
So how should we structure our code if we want to make our code easy for our users to extend? The steps are simple:
- Allow the user to describe the computation without necessarily evaluating it.14I.e., ensure that the term and interpretation distinction is retained.
- Map the terms (i.e., the description of the computation) 1-1 to typeclass functions.15Equivalently, static methods of an interface or protocol class in OOP.
- Map the interpretation(s) 1-1 to typeclass instances.16Equivalently, protocol implementations in OOP.
Following these three simple steps we blur the distinction between library authors and library users and make it easier for a library user to become a library contributor.17 Step 4: Profit.
That’s it. That’s the idea.
Comments
Comments can be left on twitter, mastodon, as well as below, so have at it.
New post!
— The Weary Travelers blog (@wearyTravlrsBlg) June 3, 2023
Idea: What is the Expression problem?https://t.co/jVLp0rGVeQ
Reply here if you have comments.
Footnotes:
I.e., with code reuse and not by reimplementation.
It’s not the only approach that does; we are not considering an exhaustive set of approaches.
I.e., the behaviour that the terms are intended to denote.
In a manner whose correctness can be verified by a type checker.
A possible application for pretty-printing might be for debugging and logging.
By way of a static type checker.
User-defined Types and Procedural Data Structures as complementary
approaches to Data Abstraction
John C. Reynolds. IFIP Working Group 2.1 on Algol (1975).
Because of
the lack of
ABCMeta.register
support in mypy
we use
Protocols.
Aka, the “initial” approach.
Finally tagless, partially evaluated: Tagless staged interpreters
for simpler typed languages
Carette, J., Kiselyov, O., Shan, C.c. Journal of Functional Programming
2009.
Equivalently, the Object algebras approach.
In languages where typeclass instances are automatically selected
(e.g., Haskell) functions such as calc
can help the programmer guide the
compiler to select the correct instance.
I.e., ensure that the term and interpretation distinction is retained.
Equivalently, static methods of an interface or protocol class in OOP.
Equivalently, protocol implementations in OOP.