In this article, we’re about to dive into the uncharted OpenGL on Linux waters. After briefly explaining how the OpenGL calls are routed from your application to the GPU, we’ll look at the NixOS special case. We’ll then explore how we can run the OpenGL programs built by Nix on a foreign distribution, such as Ubuntu or Fedora. Finally, we’ll introduce NixGLHost a new approach to solve this problem.
OpenGL/CUDA on Linux?
Everybody, including your favorite pet play video games. Your favorite go-to desktop application is relying on GPU acceleration to render on your screen, and AI is taking over the world. Welcome to 2023. In the last 15 years, GPUs transitioned from being this gamer-specific niche hardware to being this globally-used parrallel platform.
There are a lot of GPU-related APIs in the wild. OpenGL, OpenCL, Vulkan, DirectX, Metal, etc.. In this article, we’ll focus on a very small subset of them, OpenGL and CUDA. Even more precisely, OpenGL and CUDA on Linux. How are these two APIs working?
Linux being Linux, there’s no simple answer to that. In most setups, and for OpenGL alone, there’s no big unified API. There are actually two concurrent ones, acting as an interface between the actual OpenGL program and the desktop environment.
- GLX: the legacy one, a Xorg-specific API.
- EGL: the new kid in town. The interface is used by Wayland and Android among others.
To be clear: these are API specifications, not implementations. For these two APIs we have multiple implementations. Some are proprietary, and directly written by the hardware vendors. Some others are open source, sometimes implemented by hardware vendors, sometimes by courageous volunteers.
There are currently two major OpenGL implementations:
- Mesa: the main OpenSource implementation. It’s actually more of a meta-driver, it includes drivers for a wide range of hardware: Intel integrated GPUs, AMD discrete GPUs, some ARM Mali, and a lot more.
- NVIDIA: a proprietary implementation, maintained by Nvidia itself. That’s the optimal way to drive your Nvidia graphics card on Linux.
These drivers are usually split into two parts. A thin kernel-space part is in charge of the low-level communication with the GPU, and a thicker user-space part is in charge of translating the OpenGL instructions into something the small kernel-space part can understand. Conceptually speaking, the kernel space part acts as a proxy to the actual hardware, nothing more.
These driver stacks, on top of providing the low-level code in charge of talking to the GPU, are supplying implementations for various interfaces. Among those, are GLX and EGL. To sum things up, you have a GLX and EGL implementation for each and every driver. EG. libGLX_nvidia.so
, libGLX_mesa.so
, etc.
Hey, you’re lying! There is a unified API!
I already used OpenGL, I know for a fact that my program is linked against
libGL.so
, not to one of these weird-ass API!
You’re absolutely right! But what if I told you that libGL.so
is actually a lie?
I omitted an important part of this OpenGL stack on purpose: libglvnd. Originally developed by NVIDIA before being maintained by the freedesktop community, this library aims to be the thing your library is going to link against. It’s not doing anything by itself, it just dispatches the OpenGL calls to the most appropriate APIs. Through various heuristics, libglvnd figures out through which interface the OpenGL instructions should go. Roughly speaking, if the target surface is rendered through Wayland, it’ll use the EGL implementation. If it’s rendered through Xorg, the GLX implementation. It also uses some more heuristics and config files to figure out which driver (Mesa, Nvidia, etc.) should be used.
What about CUDA? Well, for this one, we don’t really have to bother with interoperability; NVIDIA is controlling the whole chain, from the user-space SDK to the silicon after all. The overall call graph is much simpler, although harder to integrate. Your CUDA program is linked against the libcuda.so
shared library. This library directly communicates with the thin kernel-space driver in charge to send the CUDA operations to the actual GPU.
Overall, from a bird’s-eye view, the overall OpenGL graphic stack looks like this:
How Does Nix Deal With That?
Now that we roughly see how the high-level components fit together in the Linux space, we know there’s a layer dynamically dispatching the graphics calls to the most relevant driver depending on the current context. It works reasonably well. Well enough to power most of the modern days GPU-fueled AI revolution. Sadly, the cost of dynamic dispatching is a loss of determinism.
NixOS
If your heart rate bumped slightly when reading the previous paragraph, chances are you already drank the NixOS kool-aid and are anticipating a fair share of pain integrating this with NixOS. And for good reason! How can we expect a statically-defined Nix closure to work on each and every end-user GPU? We surely can’t include all the existing drivers in the closure, it’d be massive!
Don’t worry too much though, we don’t have to get that hardcore. Spoiler alert: I’m about to uncover a dirty-hack. You know, the dirty-hack rendering this very web-page (if you’re reading this on NixOS).
NixOS has a magical /run/opengl-driver
directory. In this directory, you can find all the libs you’ll need to drive the GPU attached to your machine. OpenGL, Vulkan rendering, OpenCL computing, VA-API video acceleration. You name it!
This directory — we’ll call it “link farm” from now on, because, well, that’s what it really is! — is defined on a system level via the hardware.opengl
NixOS module. Overall, it means the GPU driver is defined through the NixOS system closure, and exposed at runtime through the /run/opengl-driver
link farm. The OpenGL program derivations have no knowledge of the actual OpenGL implementation. They simply link to libGL.so
AKA. libglvnd, the generic dispatch layer.
Okay, great, we have the drivers in a weird
/run/opengl-driver
location containing a link farm pointing to the actual GPU driver.Now, how can my Nix-built OpenGL program find these libraries?
Well, it’s Nixpkgs! Obviously, we’re injecting them through a wrapper. The wrapper we’re talking about is called addOpenGLRunpath
. It injects /run/opengl-drivers
to the library path through an LD_LIBRARY_PATH
entry. At runtime, Libglvnd will load these libraries and dispatch them the OpenGL calls.
On a foreign distribution, where /run/opengl-driver
won’t exist, the dynamic loader simply ignores this library path entry. The wrapper essentially acts as a no-op on a non-NixOS distribution.
In a nutshell:
- NixOS stores the host system GPU drivers in a well-known directory.
- Nixpkgs injects this well-known directory to the OpenGL program library path through a wrapper.
What About Running Nix on a Foreign Distro?
Now, how would you run a Nix OpenGL program on a foreign Linux distribution such as Ubuntu?
All of a sudden there is no more /run/opengl-driver
. This is where it gets hairy!
The go-to solution these days is NixGL. It’s a wrapper you apply on top of an OpenGL derivation. With a few clever heuristics, it figures out which driver your host system expects at build-time. It then builds the relevant driver derivation (Mesa, NVidia, Bumblebee) and generates a wrapper injecting the driver dynamic libraries to the OpenGL process load path.
Remember what I told you during the introduction: The GPU driver is split between a thick user-space library and a thin kernel-space module. What we have here is only the user-space side of the story. The user-space part of the driver is distributed through Nix, the kernel-space one by the host distribution. One might fear hitting some protocol-level incompatibilities between the user space driver and the kernel-space one. And for good reason! It could definitely happen. In practice though, it’s good enough: the NixGL heuristics tend to be fairly clever at figuring out which userspace library to use.
There’s definitely a tradeoff. The GPU driver ends up living in the Nix closure. All of sudden, your Nix closure is tailored towards a specific GPU and won’t be generic anymore. This approach requires you to precisely know the target hardware specification during the Nix build time. If you want to deploy an application on N computers, you’ll have to conceptually nix build
the application N times. You’ll need a detailed inventory of the hardware you want to deploy on, or have the build happening on the end-user machine, which is not always possible.
Kludgy! Mikey Likey!
Could we have an alternate approach to solve this Nix-OpenGL-on-a-foreign-distro problem? An alternative not requiring us to include the OpenGL driver part of the deployed closure, but rather have this managed by the host distribution? Kind of what we have on NixOS?
I think it’s fair to assume the host distribution is smart enough to properly set up the graphic stack. What if we glued the host user-space GPU environment to the Nix program instead of using a Nixpkgs-provided one? This thought was the starting point of NixGLHost, a new way of running your OpenGL Nix programs on a foreign distro.
This approach is pretty Nix-heretic. We’re injecting some host DSOs1 into the Nix closure. Nix is specifically designed to limit this kind of interaction with the host system! That being said, we’re being heretic but cautious: we only inject the entry points for EGL/GLX/CUDA. We patch the entry points’ ELF runpaths to point to a place containing their own dependencies. This acts as a minimal sandboxing setup.
I won’t dive too much into the implementation details here, you can check out the internals.md file if you’re interested in the nitty-gritty details.
Overall, this experimental new wrapper re-uses the drivers provisioned by your host distribution. The kernel-space module and user-space library are both provisioned by the host system, we’re sure they won’t get out of sync. The graphic driver stays out of the Nix closure, the closure stays truly generic and will work on any GPU.
This experiment turned out to work great for the use cases we tested it against so far. I managed to run DaVinci Resolve and a couple of videogames built by Nix on a Ubuntu machine. A customer is actually already using the prototype in production2.
What’s Next for NixGLHost?
The next obvious step is to extend the supported OpenGL implementations. At the moment, we only support the NVIDIA proprietary driver. It’d be great to support Mesa as well. Technically, it’s not a hard thing to do. Sadly, the Python prototype codebase is quite messy and difficult to extend as it is.
The wrapper is also pretty slow to run, ~200ms on a hot cache. Most of this time is spent spinning up the CPython interpreter. Starting a whole interpreter for a very short-lived program turned out to be very costly, who could have guessed! Should we embrace the meme and RIIR the project? I personally think it might be a good idea and started to work on that.
We also should think about improving the sandboxing. At the moment, we’re injecting the host DSOs1 symbols directly into the program’s global symbol table. If, by misfortune, the Nix program ends up using the same library as the GPU driver but with a slightly incompatible version, all hell breaks loose! There are some ways to go around that, we could leverage the dlmopen
function to load the driver DSOs1 to another namespace. We could potentially leverage libcapsule to perform such a “namespaced” dynamic library loading.
Let’s see how it goes, I’ll probably post a follow-up here when we’ll reach the next major milestone :)
Greets:
- flokli for enduring my terrible hacks and brainstorming this problem space with me <3
- Numtide and Otto Motors for trusting me on this and funding most of the NixGLHost work <3
- Guibou for writing NixGL <3
- Alex and tazjin for proof-reading this article <3
-
Dynamic Shared Object, a shared library. ↩︎ ↩︎ ↩︎
-
This is an alpha release, I’m not saying you should use it in production as well. Use this with caution, it may eat your kittens! ↩︎