Last year, flokli and I worked towards re-using TwoSigma’s Nsncd as the main NixOS Nscd daemon.
What is that even about? Well, Nscd is a Glibc daemon that was originally meant to cache the host/user resolution requests. It’s mostly obsolete by now, but on NixOS and Guix, we abuse this daemon to get a stable ABI to load the host NSS modules from a potentially different glibc version. If you have no idea what I’m talking about here and want to read more about that context, you should probably read flokli’s original release blog post.
It’s now been a year since we released the Nsncd host lookups. Since then, Nsncd is used by default on NixOS in place of Nscd. The migration has been mostly bugless. Emphasis on the mostly!
While this switch has been beneficial for most of us, getting rid of
the “too-much-caching” numerous bugs, some unexpected side effects
appeared. Such as
breaking the
hostname --fqdn
command.
Recently, flokli and I ended up under the same Catalonian roof for a Numtide-organized programming retreat. Kudos to Zimbatm who handled most of the thankless logistics to make this happen <3
After almost a full year of procrastination, turns out that a 10-hour session of focused work was all it took to almost fix these long-lingering issues.
Wait, why did I write almost here? Ah yes. After these 10 initial hours, I kept uncovering a long series of subtle yet real bugs for the next 4 days. Turns out that the third full rewrite of the fix was the charm!1
What was the root cause of these faulty host resolutions in the end?
Turns out that, even if the gethostbyname
and gethostbyaddr
libc
functions have been long deprecated in favor of getaddrinfo
, some
pretty critical pieces of software are still using them. You guessed
it: the hostname
command still uses those.
We originally decided to implement these two functions through
getaddrinfo
. This was a mistake: the legacy operations do not
behave exactly like the new ones.
To fix this bug, we wrote some FFI bindings for these two legacy
functions. Then, we replaced our dodgy gethostbyname
and
gethostbyaddr
implementation in Nsncd to use these bindings.
In the meantime, flokli wrote sockburp, a debug tool for Unix sockets. This has been a game changer to detect the wire format compatibility issues. You can read more about this tool in this blog post.
We have not managed to upstream all our changes to the TwoSigma repo yet. NixOS Nsncd is still hosted from our nix-community fork. We did not give up on upstreaming it though and hope to do that in the next few months.
That’s pretty much all. Until next time, happy and serene hacking folks :)