Nsncd, Anniversary Updates

Last year, flokli and I worked towards re-using TwoSigma’s Nsncd as the main NixOS Nscd daemon.

What is that even about? Well, Nscd is a Glibc daemon that was originally meant to cache the host/user resolution requests. It’s mostly obsolete by now, but on NixOS and Guix, we abuse this daemon to get a stable ABI to load the host NSS modules from a potentially different glibc version. If you have no idea what I’m talking about here and want to read more about that context, you should probably read flokli’s original release blog post.

It’s now been a year since we released the Nsncd host lookups. Since then, Nsncd is used by default on NixOS in place of Nscd. The migration has been mostly bugless. Emphasis on the mostly!

While this switch has been beneficial for most of us, getting rid of the “too-much-caching” numerous bugs, some unexpected side effects appeared. Such as breaking the hostname --fqdn command.

Recently, flokli and I ended up under the same Catalonian roof for a Numtide-organized programming retreat. Kudos to Zimbatm who handled most of the thankless logistics to make this happen <3

After almost a full year of procrastination, turns out that a 10-hour session of focused work was all it took to almost fix these long-lingering issues.

Wait, why did I write almost here? Ah yes. After these 10 initial hours, I kept uncovering a long series of subtle yet real bugs for the next 4 days. Turns out that the third full rewrite of the fix was the charm!1

What was the root cause of these faulty host resolutions in the end?

Turns out that, even if the gethostbyname and gethostbyaddr libc functions have been long deprecated in favor of getaddrinfo, some pretty critical pieces of software are still using them. You guessed it: the hostname command still uses those.

We originally decided to implement these two functions through getaddrinfo. This was a mistake: the legacy operations do not behave exactly like the new ones.

To fix this bug, we wrote some FFI bindings for these two legacy functions. Then, we replaced our dodgy gethostbyname and gethostbyaddr implementation in Nsncd to use these bindings.

In the meantime, flokli wrote sockburp, a debug tool for Unix sockets. This has been a game changer to detect the wire format compatibility issues. You can read more about this tool in this blog post.

We have not managed to upstream all our changes to the TwoSigma repo yet. NixOS Nsncd is still hosted from our nix-community fork. We did not give up on upstreaming it though and hope to do that in the next few months.

That’s pretty much all. Until next time, happy and serene hacking folks :)

  1. Ah, you’re here for the gory details? Here you go, read these two PRs history. My treat! ↩︎