How to do Python development without tears

Many machine learning projects are written in Python. Many of these projects are written in a way that makes it difficult for individuals to recreate a known working development environment. This is a pity, because it slows down scientific progress; peers find it difficult to confirm results and collaborators struggle to extend ideas in novel ways. Following some simple guidelines can dramatically help the adoption of your project and ideas. Your results will be verified and extended by others, your projects will gain adoption, and your set of potential collaborators will, in turn, grow.

Where we have previously focused on writing extensible software, here we focus on the ease of “getting started” and being able to quickly create a known-to-be-working development environment.¹¹Notably, we will not be focusing on exact reproducibility. Reproducibility is larger in scope, harder, and deserves its own blog post. I.e., is there a fairly easy way to capture direct and transitive dependencies for a Python project that allows one to (re-)create a development environment with little effort?

By the end of the post you will understand how to effectively leverage tools such as pdm and pdm-conda to make your life²²And those of your collaborators and users. easier.

Let’s dig in!

Comments

Comments can be left on twitter, mastodon, as well as below, so have at it.

To view the Giscus comment thread, enable Giscus and GitHub’s JavaScript or navigate to the specific discussion on Github.

Footnotes:

Notably, we will not be focusing on exact reproducibility. Reproducibility is larger in scope, harder, and deserves its own blog post.

And those of your collaborators and users.

Not all of which use the same Python version.

⁴

Each with their own set of dependencies.

⁵

Alice relies on the successful evaluation of the test suites in her projects to confirm that for her. I.e, what she ultimately cares about is an environment that-successfully-runs-code, which doesn’t necessarily require creating a known-to-be-working environment. However, the latter can often be a cost-effective way of accomplishing the former.

⁶

In addition to the test suite.

⁷

Or, perhaps, reviewer.

⁸

After all, if he’s unable to even do that, is there much point in him trying to go further?

⁹

With its support for a per-interpreter GIL.

¹⁰

Such as Bob.

¹¹

Such as Charlie.

¹²

I.e., Alice has taken the time to note this in the development instructions of her project.

¹³

In a telling turn of events, some time between when the first draft of this post was written and when it was published, updated versions of pdm and pdm-conda were released that were mutually incompatible.

¹⁴

For the benefit of her future self and others.

¹⁵

And other utilities.

¹⁶

I.e., “right” here means invoking the version installed in the conda environment with access to the installed dependencies.

¹⁷

Even if Alice were using Rye and had the associated shim installed, directly invoking python would not have worked as intended. This is because she’s not using the project-local .venv, but is instead using an external conda environment. As such, the Rye shim wouldn’t have automatically been able to correctly resolve things.

¹⁸

Including a sub-directory of it, and only when so.

¹⁹

Note, in particular, the absence of the .pdm-python file.

²⁰

Alice, graciously, has noted the instructions in the README.

²¹

This is the default behaviour of pdm install.

²²

I.e., he would like to quickly verify that the test suite in Alice’s project successfully runs.

²³

In other words, Charlie is not interested in evaluating the benchmarks contained in the Jupyter notebooks.

²⁴

As opposed to Bob.

²⁵

For instance, if Alice has forgotten to pin some random seed.

²⁶

Either directly, or indirectly via the pdm-conda plugin.

²⁷

I.e., for each package, the specific version installed by Alice.

²⁸

I.e., both direct and transitive.

²⁹

I.e., one could have two different projects, each using a different version of Python.

³⁰

Though there seems to be conda-lock which claims to do that.

³¹

Note, that a “lock” file isn’t sufficient. What happens when a previously available version stops being available? Thankfully, there are projects such as Software Heritage.

³²

Via pdm-conda.

³³

Or Nix, if so inclined.

³⁴

Reproducibility and/or Repeatability is an important topic on which I have a few thoughts. However, what I have to say is too large to fit in the margin and must wait for a future blog post.

³⁵

Whether or not said effort is justified depends on the specific needs of the project.

³⁶

These gaps may or may not prove to be pertinent for the code to fulfill the developer’s intent.

³⁷

Or a host of other possible issues that can affect reproducibility.

The Weary Travelers

How to do Python development without tears

The scene

Alice’s perspective

Bob’s perspective

Charlie’s perspective

What did we learn?

Why `pdm` and not …

Conda

Poetry

Rye

Guix³³³³Or Nix, if so inclined.

Limitations

Comments

Footnotes:

The scene

Alice’s perspective

Bob’s perspective

Charlie’s perspective

What did we learn?

Why pdm and not …

Conda

Poetry

Rye

Guix3333Or Nix, if so inclined.

Limitations

Comments

Footnotes:

Why `pdm` and not …

Guix³³³³Or Nix, if so inclined.