Nix Substitution: the Way Forward

Abstract

Nix and Guix have the unfortunate reputation to require a lot of bandwidth to distribute software. This reputation is sadly grounded. It seems like explicitly pointing to your dependencies comes with an overhead cost in terms of download size. Or does it?

Currently, both Guix and Nix can substitute pre-built store paths from a binary cache using the Nix Archive (NAR) format. After conducting several benchmarks, we conclude that the substitution granularity should be finer to improve the substitution performance. We can increase the substitution granularity either by substituting files individually or by substituting chunked individual files.

If you are already familiar with the Nix and Guix substitution mechanism, feel free to ignore the next two sections and directly jump to the “Rethinking The Substitution Granularity” one.

You Said Substitution?

We usually don’t want to rebuild the whole world locally. When we want to install Firefox on a laptop, we’d rather not build it from the source on the machine itself but instead download a pre-built version from a cache. For this purpose, Nix provides a store path substitution mechanism.

Instead of building a derivation locally, we substitute the local build with a pre-built artifact stored in a binary cache. The store paths are getting transferred between the binary cache and the local machine using the NAR archive format. This format can be seen as a compressed store path. Meaning we’ll have a NAR archive per top-level store path we’d like to substitute.

A NAR is always identified using a hash derived from its content. NARs are content-addressed.

In contrast to that, both the Nix/Guix derivations and the store paths they generate are input-addressed1: their hashes are directly derived from their dependencies hashes.

You now see the issue: on one hand, we have input-addressed derivations and store paths, on the other hand, content-addressed NARs.

Starting from an input-addressed derivation hash, how are we supposed to figure out the name of the content-addressed NAR we want to download? In other words, how can we figure out the output hash of a derivation without building it first?

Introducing the NARinfo file. Let’s take an example:

» curl https://cache.nixos.org/r5sjd57x0r07bwgipryaxqdkx1gglhiy.narinfo

StorePath: /nix/store/r5sjd57x0r07bwgipryaxqdkx1gglhiy-libidn2-2.3.2
URL: nar/010zh5j819qyz6zai6h2hn3d7i406g00m2g4g62bq7m8g91553yb.nar.xz
Compression: xz
FileHash: sha256:010zh5j819qyz6zai6h2hn3d7i406g00m2g4g62bq7m8g91553yb
FileSize: 63248
NarHash: sha256:1npw0jz1cw4k9x25f2vsdhsa5cf9568j46bpz2768a1izqj5n9lf
NarSize: 260816
References: 0z7sqj4pilbqyp45ix5b0mdgn9xlb024-libunistring-0.9.10 r5sjd57x0r07bwgipryaxqdkx1gglhiy-libidn2-2.3.2
Deriver: 0n91syjwrhmng41f8d23ad0sl4a6ic4g-libidn2-2.3.2.drv
Sig: cache.nixos.org-1:G9PxMT0/nd9ELwL3BBeqWtb2ohMiqw4T4FUwFlAU9M0E45mKg77BzL9gJQ3wH4oYtsa61MV5uzNLPFvK8ZbyCA==

To substitute the path /nix/store/r5sjd57x0r07bwgipryaxqdkx1gglhiy-libidn2-2.3.2 from the cache.nixos.org binary cache, we’ll first have to download its associated NARinfo file. That NARinfo file is addressed using the same input-addressed hash as its associated store path: r5sjd57x0r07bwgipryaxqdkx1gglhiy. The NARinfo will contain some metadata needed to perform the substitution, but also, and most importantly, the URL from which we can download the content-addressed NAR.

To sum up, to perform a substitution, the Nix/Guix client will:

  1. Compute the input-addressed store path of the derivation.
  2. Fetch the associated .narinfo from the binary cache it wants to substitute from.
  3. Download the content-addressed NARs referenced by the .narinfo.
  4. Unpack the downloaded NARs to the local Nix store.

Substitution Hell

Now that we established how the Nix substitution mechanism works, let’s have a thought experiment: what will happen if we bump the version of a derivation living deep in the Nixpkgs/Guix dependency tree? Something like curl.

The curl store path being input-addressed, its name will change for sure. Likewise, its dependencies are also input-addressed, their name will also change. Overall, the curl store path change will propagate throughout all the Nixpkgs/Guix dependency graph.

You might say:

Hey! That’s not a big deal.

The NARs are content addressed, this change of input hash shouldn’t propagate there.

The binary cache might have to re-build the whole dependency graph, however, outside of curl, the content of the derivations should remain the same. So, in the end, most of the NARs should remain the same. The clients shouldn’t have to re-download all the store paths depending on curl.

We won’t have to re-download massive derivations like Firefox and Chromium for a simple curl bump.

Right?

And you’d be right except for one thing: in the Nix/Guix model, everything is explicit. Every time you’ll refer to the curl binary, you’ll have to explicitly refer to its store path /nix/${inputHash}-curl/bin/curl. Its input-addressed store path! This sneaky reference will manage to propagate the input-adressed instability to the output-addressed NARs.

Coming back to your example: both Chromium and Firefox depend on curl. You’ll have to re-download both of these massive store path for every curl bump. Hopefully, you won’t mind downloading a couple of gigabytes for a sub-kB string change? Do you? :P

One has to think: could we improve this situation?

Rethinking the Substitution Granularity

Content addressing the substitution archives is definitely a good idea. It brings some hash stability to this unstable input-addressed scheme. However, as we saw earlier, because of the store path references, the input-addressed hash instability ends up propagating through the substitution mechanism.

To improve the substitution stability, we could think about two solutions:

  1. Remove the store references from the NAR and replace them on the fly when decompressing the NAR into the actual Nix/Guix store.
  2. Increase the substitution granularity.

I’m going to ignore that first solution for this post and focus on the “Increase the substitution granularity” solution.

The Nix and Guix communities came up with two solutions based on this finer granularity substitution idea: nix-casync and a file-based substitution discussed in the Guix mailing list.2

Living in the countryside, my only Internet connection is a weak LTE link. I have 3 NixOS and 1 Guix system at home. Any mass-rebuild takes me at least 2 hours to substitute on a good day.

Needless to say, I have a personal urge to improve that situation!

Before blindly stepping into what looks like a multi-year-long sustained implementation effort, I wanted to gather some hard data showing how much we could gain from increasing the substitution granularity. It’s benchmarking time.

Benchmarking Fine Granularity Substitution Mechanisms

For various scenarios, we’re going to build a Nix closure for 2 Nixpkgs commits. Then, we’ll simulate an update from the first commit to the second one and see how much data we’ll need to download from the binary cache.

For each scenario, we’re going to evaluate the 4 following substitution techniques:

  1. Nar substitution: This is the substitution model currently used by the NixOS and Guix projects, it’ll act as our metrics baseline. It consists in archiving and compressing a full store path. In this benchmark, we’ll identify each NAR by its filename derived from the sha256sum of its content.
  2. Casync substitution: This is an experimental substitution mechanism implemented via the nix-casync project. Starting from a NAR, we decompress it, serialize all the files in a single stream, and chunk it into smaller bits before compressing them again. In this benchmark, we’ll identify each casync chunk by its filename, itself already derived from the sha256 sum of its content.
  3. File-based substitution: This is the substitution method the Guix project brainstormed around. Each store gets substituted separately. In this benchmark, we’ll identify these files using the sha256 sum of their content.
  4. XZ-Compressed File-based substitution: similar to the File-based substitution except each file is individually compressed using the xz compression mechanism (profile 6 extreme).

If you’re interested in the nitty-gritty details, the source code and the raw data behind these benchmarks live in this git repository. You can also have a look at the Jupyter notebook I used to crunch the benchmarks data.

Note: we’re using NixOS/Nixpkgs for all these benchmarks. However, provided Guix currently uses the same substitution mechanism, you can safely assume the same conclusions holds for it as well.

Mass Rebuild

Let’s kill the suspense right away and have a look at the most interesting scenario first. We’re going to simulate a NixOS closure update after the staging-next 2021-12-03 merge containing, among other things, a curl update. As we already established in the “Substitution Hell” section, a curl bump will propagate through the whole NixOS dependency graph. That’s the worst-case scenario in terms of substitution: the mass rebuild.

We’re going to use a NixOS machine description depending on several diverse language stacks (C, Python, BEAM) and simulate a closure update after this mass rebuild.

Let’s see how the four substitution techniques perform:

Mass Rebuild Diagram

Name Closure Size (MB) Downloaded Size (MB) Re-used Size (MB) DL Savings Compared to NAR (%)
NAR 386.06 372.99 13.0702 0
Casync 608.652 192.195 416.457 48.4719
File 1652.19 705.665 973.348 -89.1915
Compressed File 476.523 229.489 247.034 38.4732

First, the good news: increasing the substitution granularity definitively works! The casync-based substitution almost cuts by half the amount of data we need to download for this mass rebuild.

YAY!

The xz-compressed file-based substitution sits in the same ballpark, saving us ~40% worth of data to download.

We can already see two other things from this experiment:

  • Without proper compression, the file-based substitution performs about half as good as the NAR substitution.
  • Looking at the closure size of the various substitution methods, we can realize that the finer the substitution granularity gets, the worse the compression algorithms are performing. In other words, the massive gain in terms of re-used data/downloaded data ratio (~75% for casync) does not directly translate into download savings. We probably want to spend some time investigating the state-of-the-art WRT. compression algorithm specialized in small files.

Provided these results, I’ll omit the uncompressed file substitution-related data for the next benchmarks: it’s consistently worse than the compressed counterpart in terms of performance and hurts the readability of the diagrams.

Channel Jump

In this scenario, provided the same NixOS machine closure we used in the Mass Rebuild scenario, we’ll simulate a jump from the NixOS-21.11 stable channel to NixOS-unstable.

Channel Jump Diagram

Name Closure Size (MB) Downloaded Size (MB) Re-used Size (MB) DL Savings Compared to NAR (%)
NAR 387.458 375.167 12.2905 0
Casync 610.886 310.541 300.345 17.2261
Compressed File 478.12 307.479 170.642 18.0423

I did not expect much savings here, but surprisingly enough, we’ll shave about 18% worth of downloaded data using the compressed file-based substitution or casync.

This scenario does not make total sense from a pragmatic point of view as-is: you rarely jump a NixOS machine closure from a stable to unstable NixOS channel. It however does make sense in the context of a dev machine. You often have to jump between different projects having their own dev nix-shell closures pointing to different Nixpkgs channels.

The good and unexpected news is: even in that scenario, we can expect some substantial gains in terms of the amount of data we’ll need to download!

Single Derivation Bump

Now, one may ask: can we expect any savings for a single derivation substitution?

Ludovic Courtès recently posted a summary of his findings on that matter in the Guix mailing list for various binaries containing various amounts of binary data and text.

Let’s replicate this experiment with our benchmark. We’re going to simulate a version bump for Firefox, Gimp, Emacs, and OpenMPI. To do so, we’ll build the two derivations before and after the version bump, and from there see how much data we’ll be able to re-use and how much data we’ll need to download.

Firefox

Firefox bump results bar chart diagram. See the table below for a screen reader accessible alternative

Name Closure Size (MB) Downloaded Size (MB) Re-used Size (MB) DL Savings Compared to NAR (%)
NAR 219.352 56.4315 162.92 0
Casync 342.925 55.871 287.054 0.993342
Compressed File 260.424 55.8295 204.594 1.06682

Gimp

Gimp bump results bar chart diagram. See the table below for a screen reader accessible alternative

Name Closure Size (MB) Downloaded Size (MB) Re-used Size (MB) DL Savings Compared to NAR (%)
NAR 247.506 20.2708 227.236 0
Casync 363.372 13.0572 350.315 35.586
Compressed File 300.3 10.1874 290.113 49.7434

Emacs

Emacs bump results bar chart diagram. See the table below for a screen reader accessible alternative

Name Closure Size (MB) Downloaded Size (MB) Re-used Size (MB) DL Savings Compared to NAR (%)
NAR 110.331 39.7443 70.5862 0
Casync 177.734 44.7264 133.007 -12.5355
Compressed File 136.6 40.7356 95.8641 -2.49415

Open MPI

Open MPI bump results bar chart diagram. See the table below for a screen reader accessible alternative

Name Closure Size (MB) Downloaded Size (MB) Re-used Size (MB) DL Savings Compared to NAR (%)
NAR 139.54 3.33596 136.204 0
Casync 214.708 4.38082 210.328 -31.3209
Compressed File 167.518 3.26096 164.257 2.24831

Once again, looking at the closure sizes, we can see we are experiencing a loss in terms of compression efficiency. We have to pay back that loss with an even greater substitution performance to be better or even on par with the NAR-based substitution.

So, does a increase in substitution granularity helps when substituting a single derivation?

The answer is: it depends.

While it’s clear that on a NixOS closure level the gains are almost always considerable, on a single derivation substitution, the gains will greatly depend on the content of the derivation. In the case of a derivation mostly containing text files such as Gimp, we can expect lot of savings. In the case of a derivation mostly consisting of binary data (Firefox, Emacs), the gains are likely to be residual.

Casync is performing consistently worse than the compressed files approach. This can probably be explained by two things:

  1. Loss of the file boundaries: the chunks are generated from a serialized stream containing the whole store path content. It’d probably be a good idea to instead chunk the files individually in our use case.
  2. Shy compression: the Casync closure size is almost twice (!!) as big as the NAR one. While the small size of the chunks can explain to some extent this compression efficiency loss, nix-casync is also using lzma to compress its chunks. Moving to more aggressive xz compression could be a good idea here.

Long Road Ahead

So, what’s the take-away here?

Well, to me, the main one is that the current Nix/Guix substitution bloat story is not helpless. We can expect some serious improvements by increasing the substitution granularity. Be it using a file-based substitution mechanism or using a chunk-based one.

It would probably be worth investing some resources into pushing these substitution mechanisms to a production-ready state. We should also seriously consider using them as the default substitution mechanism once they’re ready. And potentially give up on the NAR-based approach.

Increasing the substitution granularity won’t sadly come for free in terms of complexity. It’ll affect a lot of things, possibly including rethinking the binary cache signature mechanism. We’ll likely need quite some time before being able to confidently replace the NAR-based substitution mechanism. It’s definitely not a weekend project.


Special thanks to Profpatsch for his spot on review.


  1. Full disclosure, while it’s okay to omit them in the context of this article, there are two notable exceptions to this input-addressed nature:

    • Fixed Output Derivations (FODs): they are, as their name implies, already output-addressed. We know, at build time, their content hash. We use them to deterministically download stable artifacts from the Internet. EG. bootstrap seed, source code, binaries…
    • A content-addressed store mode is available through an experimental feature. See RFC#62 for more informations.
     ↩︎
  2. Bias Warning: the main author of nix-casync is an online friend. ↩︎