eBPF: Yes, it’s Turing Complete!
We show that eBPF is Turing complete, which means it can be used for any computable problem
Over the last few years, we’ve heard from a few eBPF skeptics who have reservations about whether it’s capable of handling the complex processing necessary for application-level networking. Some folks say that it’s not capable of parsing L7 protocols or terminating TLS connections. That was true in the past, for example, back when the eBPF Verifier had a “complexity limit” of just four thousand instructions. But today, as we’ll show in this post, eBPF is capable of (almost) anything!
Here at Isovalent, we’ve been involved in the nuts and bolts of eBPF right since the very beginning. Our engineers don’t only use it as a platform for building sophisticated tools like Cilium and Tetragon – many of them are also involved in implementing eBPF within the kernel, as well as creating libraries and tools that help engineers build with and use this technology. This expertise allows us to push the boundaries of what can be done using eBPF. At KubeCon Paris, we demonstrated that eBPF is Turing complete, which means it can be used to solve any computable problem!
Turing Completeness and Conway’s Game of Life
Alan Turing famously conceived the idea of a Turing machine, which can solve any computable problem, given enough time and memory. If a programming language is Turing complete, it is capable of solving the same set of problems as a Turing machine. We can think of this as being able to solve any arbitrarily complex problem.
Conway’s Game of Life has been proven to be Turing complete. This means that if a programming language can implement Game of Life, it too is Turing complete and can be used to solve problems of arbitrary complexity, given sufficient time and memory.
In Conway’s Game of Life, each cell in a grid can be in one of two states: alive or dead. The game runs for an indefinite number of steps, and in each step the state of each cell is recalculated according to some simple rules:
- If a live cell has fewer than two live neighbors, it dies (of loneliness!)
- If a live cell has more than three live neighbors, it dies (it is overcrowded)
- If a live cell has two or three live neighbors, it stays alive
- If a dead cell has exactly three live neighbors, it becomes alive (reproduction)
There is a nice simulation at playgameoflife.com, and this slide deck from Melissa Gymrek at MIT explains how Conway’s Game of Life can be used to solve computational problems.
If we can implement Game of Life in eBPF, it tells us that eBPF is Turing complete.
The eBPF Verifier and the Complexity Limit
As you may know, eBPF programs go through a verification process to ensure that they are safe to run. The Verifier runs in the kernel, and analyzes all the possible paths through a program when it is loaded.
Since verification itself runs within the kernel, it has to complete within a reasonable amount of time, to avoid having a kernel task running forever and consuming resources. It’s also important that verification can complete quickly so that eBPF can meet users’ expectations of changing behavior dynamically. For example, consider the Cilium use case where the CNI could be loading dozens of BPF programs to handle changes to the workloads running in a Kubernetes deployment. If loading and verifying these programs were slow, it would impact the entire system.
To ensure that the verifier doesn’t hog the CPU for too long, eBPF has a “complexity limit” – the number of instructions that the verifier can analyze before it “gives up”. Since kernel version 5.4 the complexity limit has been one million instructions. If the program can’t be verified before the complexity limit is reached, it will be rejected.
You could very reasonably consider that this means that eBPF can only solve problems that take 1 million or fewer instructions. To put this in context, the game Doom is implemented in fewer than 100k instructions. Envoy can’t fit in a single eBPF program because it consists of around 15 million instructions, and the Clang compiler is around 26 million instructions. Plus, these programs can all have an arbitrary amount of looping. So it’s still true that programs of this complexity can’t be implemented in a single eBPF program. We need the ability to split complexity across multiple programs, and we need the ability to loop.
Looping in eBPF
Back in kernel version 4.19, eBPF programs could not loop, because the Verifier checked that any instruction could only be run once. This made the job of the Verifier more straightforward, but it made it hard to write eBPF programs to do anything complex. At this stage the complexity limit was just 4096 instructions, and that’s not very many, especially when none of them can be repeated. Developers used “unroll” compiler directives that convert source code loops into a series of inline instructions executed in series, which made it less tedious for humans to write (and read) eBPF programs, but as you can imagine, it could quickly use up the instruction limit.
As mentioned above, by kernel version 5.4 the complexity limit was increased to one million instructions, and the Verifier was also improved to handle loops. There are a couple of conditions: firstly the loop has to be bounded, so that the Verifier knows there is a maximum number of times the loop will be executed; and secondly, instructions count towards the “complexity” of the program each time around the loop. If you loop 1,000 times around a loop of 1,000 instructions, you will hit the one million instruction complexity limit.
That’s actually a simplification – the Verifier had some optimizations which allowed it to skip verifying every single iteration of a loop if it was able to determine that the loop state is the same between iterations, and it was otherwise safe to do so (for example, the looped instructions don’t write to memory). This created a bit of an art to writing eBPF loops, where they could be crafted in a way that the Verifier could recognize as safe. It’s not always practical or possible, but it means that a developer with deep internal knowledge of the Verifier can often coax considerably more complexity out of an eBPF program than someone taking a more naive approach.
By version 6.1, the kernel supports a helper function called bpf_loop(). This allows the loop instructions to be implemented as a separate eBPF program, which is verified independently of the caller. The contents of the loop can now have a complexity count of one million instructions, making the eBPF programmer’s job significantly easier.
Splitting Complexity Across Multiple eBPF Programs
The bpf_loop()
helper is one way to split complexity across multiple eBPF programs. Another approach available to developers since version 5.4 is to use tail calls. This is like making a function call, except that the calling function’s state is removed from the stack, and replaced by the called function, so that execution doesn’t return to the calling function when the called one completes. Complex problems can be split across multiple (sub)programs that tail call each other, with state being stored in eBPF maps as necessary.
There is a “depth” limit where up to 31 tail calls can be made. This brings the complexity limit up to 31 million instructions, so perhaps it’s possible to implement Envoy or Clang with this approach! Intuitively, it seems very likely that this is more than enough to implement the straightforward rules of Game of Life. But it still doesn’t allow for arbitrary complexity, and loops still have to be bounded. We need something else to allow us to implement a (theoretically) infinite number of rounds of the Game of Life.
Before we discuss how an unbounded loop can be achieved in eBPF, let’s consider why there is a limit on the number of tail calls that can be chained together. The answer is similar to the reason why the Verifier has a complexity limit: the eBPF program(s) must release the CPU within a reasonable amount of time so that it can get on with doing other things. If an eBPF program, or a set of eBPF tail calls, never released the CPU (or didn’t release it for a long time) it would have noticeable effects on the rest of the system, like stopping networking from behaving properly, or making apps appear to lock up or run slowly.
User space applications can appear to run indefinitely because they are scheduled by the kernel; an application “thread” can use the CPU for a certain amount of time before the thread is paused, allowing the CPU to be used for something else. The thread can then be restarted again later, picking up from where it left off.
In eBPF, we can achieve something similar by using timers to schedule the next eBPF program.
Timers in eBPF
Since version 5.15, the kernel has supported BPF timers. This allows an eBPF program to set a timer, after which another BPF subprogram will run.
In the eBPF implementation of Game of Life, we want to recalculate the grid on a regular basis. At the end of each round of recalculation, we start a timer that pops after two seconds to trigger the next round. This gives us a loop that will run indefinitely. It’s also possible to set BPF timers with a duration of zero, which tells the OS to make the callback as soon as possible after other system-critical actions have been performed.
This excerpt from the Gate of Life code shows the game()
function that runs once, setting the timer callback to do_game()
, and starting the timer to run for two seconds. Before do_game()
returns, it restarts the timer, creating a loop that will run indefinitely.
Conway’s Game of Life in eBPF
You’ll find our implementation of the Game of Life on GitHub at github.com/isovalent/game-of-life. The user space loader code is written in Go using the cilium/ebpf library.
The other part of the “game” that has to be written in user space today is the part that displays the grid to the user. eBPF code running in the kernel does all the calculations to populate the grid and store them in an eBPF map. It also sends a copy of the map’s contents to user space using a ringbuf, and the Go application displays the grid in human-readable form. This is necessary because there are no BPF helper functions for displaying output on the screen.
We also need to attach our eBPF implementation to some arbitrary event to trigger the first round of the game. We chose to attach to a network egress event, looking out for packets on port 0x71FE.
What Does This Mean?
Implementing Game of Life entirely in eBPF means that it is Turing complete, so any computable problem that can be solved in any programming language, can also be solved in eBPF. Those complex problems that people thought were beyond the capabilities of eBPF, like parsing application-level protocols, or terminating TLS connections, are now shown to be possible.
That doesn’t necessarily mean that eBPF is the most appropriate approach for solving every problem! For example, in Cilium we’ll continue to use Envoy to handle a lot of application-layer networking capabilities, because it’s a field-hardened, proven implementation that meets our users’ needs. And more broadly, just because eBPF could be used, doesn’t mean it’s the most appropriate tool.
Because Isovalent engineers have been so involved in the development of eBPF and its Verifier, we’ve been able to push the boundaries of what’s possible in eBPF through projects like Cilium, Tetragon, and the Isovalent enterprise platform. Get in touch to learn how eBPF can be employed to solve the networking and security problems you’re tackling today!
Liz Rice is Chief Open Source Officer with eBPF pioneers Isovalent, creators of the Cilium project, which provides cloud native networking, observability and security. She is a member of the Open UK Board, and of the CNCF’s Governing Board. She was chair of the CNCF’s Technical Oversight Committee 2019-2022, and co-chaired the KubeCon / CloudNativeCon 2018 events in Copenhagen, Shanghai, and Seattle.
Her background and core competence is systems software engineering, but she loves the whole process of building products.
John is currently leading the Security and Observability team at Isovalent, where he created Tetragon, a runtime security and observability tool. When not working on Tetragon, he maintains various bits and pieces of the Linux kernel, contributing primarily to the BPF subsystem and networking stacks. In previous work, he contributed to the Cilium CNI, networking domain-specific languages, and wrote a handful of network drivers for various hardware and operating systems.