eBPF Documentary: eBPF’s Creation Story – Unlocking The Kernel
This is a story of friends creating a technology that is often barely visible but can be found in most devices today. It has changed the tech industry significantly and will continue to do so in the coming years. I’ve been fortunate to be among that group of friends while working in Red Hat’s kernel team to help create something truly remarkable.
The recently released eBPF documentary produced by Speakeasy Productions does a fantastic job of capturing the creation of eBPF from its inception. I took this opportunity to write down some of what I remember while sharing pictures from the early days of eBPF.
From Linux and SDN to eBPF
Before diving into the creation of eBPF, it helps to understand the context it grew out of. 2011 was an exciting time for many of us working on the Linux kernel. We had all been building features over features for years and went through a fantastic growth of Linux. Linux was the clear new winner as the standard server operating system.
Most of us later involved in the creation of eBPF were working on the Linux kernel, specifically on network virtualization and software-defined networking. Network virtualization was a logical challenge that arose with the success of the virtualization of infrastructure. Instead of running only physical servers, the datacenter was suddenly virtualized by running large fleets of virtual machines. This logically required that the physical servers hosting the virtual machines had to become capable of routing network packets in the datacenter instead of just running applications.
In the early days, network virtualization was primarily just L2 bridging performed in software. It became apparent quickly that L2 bridging wouldn’t be the right solution in the long term, so we all started to evolve the software networking stack. Several of us did so by working on Open vSwitch. We all knew that the network itself had to become incredibly programmable to cope with the future growth of network infrastructure complexity and the fast-rising concept of cloud computing.
This combination requiring innovation and Linux finding its way into use cases around the globe from running on servers, smartphones, and even Mars-landing helicopters created an entirely new challenge. On one hand, users wanted Linux to become boring and stable. On the other hand, Linux had to evolve quickly to cope with the fast-rising requirements of virtualized datacenters. The push for stability and becoming boring was a natural enemy to the desired innovation.
This set the scene for what would evolve into eBPF. Let’s dive in.
PLUMgrid was one of several startups working on SDN solutions at the time. PLUMgrid specifically wanted to build a set of network functions running fully distributed on servers to provide network functionality. One of PLUMgrid’s leading engineers was Alexei Starovoitov. As part of troubleshooting a networking incident, Alexei realized that even the compiler cannot be fully trusted. Instead, the operating system has to verify the application and ensure it is safe to run. Alexei decided to invent a new instruction set based on x86 assembly and get the kernel to verify the instruction set for safety. This was the starting point of what would later become eBPF; it was not called eBPF yet, though.
To make this early idea useful, it would eventually need to get merged into the Linux kernel. Alexei decided to fly up to Portland to meet Chris Wright to get advice on how to get this merged into the Linux kernel potentially. Chris, today the CTO of Red Hat, was the lead architect for SDN at Red Hat and immediately understood the value of what Alexei had been thinking about.
Chris pointed Alexei to look into an existing kernel subsystem called BPF and integrate it into existing BPF to make it easier to understand and accept for the community.
The Initial Patch Submission
Alexei decided to modify the codebase and adjust the instruction set to make it look more like BPF and eventually posted the first patch set to the kernel mailing lists. Nothing. Nobody looked at it. One person asked a silly question, but no reaction otherwise.
But the patchset did not go unnoticed. I worked at Red Hat at the time, and I remember Daniel Borkmann coming into my office and saying, “Hey Thomas, I saw this patchset. This looks amazing. Can I work on this?” Daniel had been working on the BPF and networking subsystem of Linux for a while. When I looked at the patchset myself, I was blown away immediately. It was super interesting, and a million light bulbs went off in my head.
We immediately understood the proposal’s value but also understood that its initial scope was too extensive and intrusive to get merged. We decided to visit Alexei in the PLUMgrid office to discuss how we could convince the networking Kernel community, including David S. Miller, the networking subsystem maintainer, to get this patchset merged.
The kernel community had just merged the Open vSwitch project into the mainline kernel, so this proposal directly competed with a new subsystem that had just been merged. It was also overlapping with nftables which was supposed to be the successor of iptables. Naturally, it had to be justified that eBPF would add significant additional value to offset the cost of maintaining yet another potentially large kernel subsystem.
The Team Grows
Working on the justification of eBPF’s value to the networking team of the Linux kernel, which was primarily Daniel’s task going forward, was not the only action taken. Alexei also explored an additional use case: kernel tracing.
In 2014, Brendan Gregg was leading an effort at Netflix to run Linux workloads in the cloud. Brendan has a tremendous background in Unix and systems engineering due to his time at Sun Microsystems, with an incredible understanding of system tracers and how to use them. Brendan was evaluating the over ten available system tracers available on Linux. Still, none of them met Brendan’s requirements, so he decided that he had to fix and complete one of them.
One day, a Netflix co-worker asked Brendan: “Hey Brendan, did you see the e-mail about eBPF on the kernel mailing list?” One of Brendan’s co-workers suggested inviting Alexei to visit them at Netflix.
After a couple of hours in front of a whiteboard, they came up with an agreement that if eBPF added support for kprobes (kernel probing hooks), Brendan would bring all of his experience in tracing from Unix and write the necessary frontends to make it usable. This would later become bcc and bpftrace.
Getting eBPF merged
The team of eBPF advocates had grown, and the moment came when Daniel and Alexei submitted a new iteration of the patchset. It was reduced in scope and primarily focused on making the existing BPF subsystem run faster and better.
David S. Miller decided that if this enhances something we already have, why not? Dave merged the patchset with a one-liner:
This was a monumental moment for all of us — a first step on a journey that would change the industry. I remember Daniel telling me the story of him being at the airport at the time and just checking his e-mail on his phone before departure and seeing the e-mail from Dave. We all celebrated even though we had no clue how much this would mean at the time. Over the next few years, Alexei and Daniel worked on the kernel side of eBPF to grow it piece by piece.
Superpowers nobody knew about
eBPF was in the kernel now, but barely anybody knew about it. But what exactly were those superpowers? In 2014, driving innovation through the Linux kernel had become challenging. The times of people compiling their own kernels were mainly over. Most users were consuming enterprise Linux distributions or just ran whatever kernel version that came installed on their device. This meant that any change to the Linux kernel took years to make it into the hands of end-users.
eBPF changed this fundamentally, as with the presence of the runtime, any idea could be turned into an eBPF program and loaded at runtime within days. This meant we could rebuild everything better. We had to decide what to rebuild first.
Early Use at Hyperscalers
Aside from bcc tools evolving to provide meaningful kernel tracing at Netflix, a significant first user of eBPF was to provide L4 load balancing at Facebook. The team built a load balancer that got open-sourced under the name Katran later on.
When Facebook engineers presented the work at NetDev 2017, the 10x performance gains compared to IPVS were eye-opening to many people, and it was a first “aha moment” for many on how powerful eBPF could be. The program would only spend about 50 nanoseconds to process a packet, powerful enough to act as a load balancer for all of Facebook’s datacenters.
eBPF was starting to grow. Brendan, Alexei, Daniel, and I all gave talks at different conferences about eBPF and how it would change infrastructure software and tracing.
Even though eBPF had grown incredibly powerful, the eBPF runtime, essentially an x86 assembly-like instruction set, was a technology aimed at kernel developers. It required a company to have its own kernel team to leverage it properly. eBPF was incredibly powerful but complex to consume.
Cilium – Bringing eBPF to End-Users
It was time to think about how to make eBPF more accessible to end-users. Brendan Gregg, along with other contributors, had been working on bcc to make eBPF accessible for kernel tracing. How would we make eBPF approachable to end-users for networking and security use cases?
It was time to create Cilium. Daniel Borkmann, Andre Martins, Madhu Challa, and I sat down and wrote the first lines of Cilium source code. The goal was to write a completely new networking layer, secure by default, meeting the scalability and performance requirements of containers, and with full alignment to the fast rising Kubernetes ecosystem where platform and application teams would operate it without wanting to understand the underlying infrastructure layer.
Cilium became successful so quickly that I decided to join forces with Dan Wendlandt, raise money, and found a company around Cilium: Isovalent. We made a huge bet on eBPF and Kubernetes as the defining technologies for the cloud native era and kept pushing what is possible with eBPF in the next years.
Today, Cilium is best known as a CNI and provides secure and observable connectivity on the network and service mesh level for Kubernetes and beyond. We have donated it to the CNCF, and it graduated as a project in 2023. Google (GKE, Anthos), Amazon (EKS-A), and Microsoft (AKS) have all adopted Cilium to provide networking and security for Kubernetes.
Isovalent would not exist without eBPF, and eBPF would not be where it is today without Isovalent.
DockerCon was a big inflection point not only for Kubernetes but also for eBPF. Several eBPF talks were presented and rewarded as top conference talks. Coincidentally, Brendan Gregg, Liz Rice (Chief Open Source Officer at Isovalent today), and I stood on the keynote stage together. It was only later that we realized how special that moment was.
The community growth of eBPF projects, including Cilium and bcc, skyrocketed after DockerCon. All major companies started to adopt eBPF, and many new eBPF-based projects started to appear in the following years.
Google Bets on Cilium & eBPF
A big milestone for Cilium and eBPF was when Google announced using Cilium as the networking layer for Google Kubernetes Engine (GKE). The creator of Kubernetes selecting eBPF as the best underlying technology to provide networking for their own managed Kubernetes platform was a strong statement regarding the maturity of eBPF and Cilium.
Today, we’re introducing GKE Dataplane V2, an opinionated dataplane that harnesses the power of eBPF and Cilium, an open source project that makes the Linux kernel Kubernetes-aware using eBPF. Now in beta, we’re also using Dataplane V2 to bring Kubernetes Network Policy logging to Google Kubernetes Engine (GKE). (source)
eBPF for Security
eBPF had entered tracing and networking successfully; the next stop was security. Today, every major security vendor uses eBPF in some form, but one of the earliest users was LSM BPF to remove the security performance-overhead tax. BPF LSM defines attachment points to allow security modules (eBPF programs with type LSM) to provide their own implementation of the desired LSM hooks. It provides a security language that allows eBPF programs to detect and react to intrusion signals instead of changing kernel source code.
BPF LSM was a great step forward for security as it removed performance overhead that was assumed to be a necessary evil of improving security. It was designed as a low-level kernel interface, though, and required users to write eBPF programs directly. As a result, several new security projects appeared focusing on bringing eBPF to end-users in an approachable way, one of them being Tetragon.
Tetragon’s design goal was similar to BPF LSM in not requiring to pay the “security performance tax” and ensuring not to be vulnerable to TOCTOU attacks. In addition, it focuses specifically on being Kubernetes-native and solving runtime security use cases for containers and cloud native Kubernetes environments.
eBPF on Windows
Until then, eBPF’s evolution was limited to Linux until Dave Thaler decided to change this and start the project to port eBPF to Windows and implement an initial partial port of Cilium on Windows. Dave realized that Microsoft had different groups to work on identical infrastructure projects for both Windows and Linux. eBPF would eventually allow the consolidation of this work and get to a standardized language to write operating system-level infrastructure code.
As part of this port, the work to standardize eBPF via the IETF will be crucial for the long-term success of eBPF in the industry across many operating systems.
To close this all out, I would like to leave you with quotes from the people who have been the most involved since the beginning. Overall, we have barely scratched the surface of what will become possible with eBPF.
I have used eBPF in meetings where in a meeting someone says: “I wish we were able to measure this”, and I said: “Hold on”, opened up my laptop, and said: “I’ve done it in production” before the meeting was over. “Here is your answer”. That’s the kind of super power we need.Brendan Gregg
Even though it has been ten years, it is just the beginning. What we saw in innovation going towards user space has now come back into the kernel.Daniel Borkmann
My passion is to innovate and enable others to innovate and eBPF fulfills this role in my life. If in the past the kernel could be programmed by 100 people in the world, now 100K people can program the kernel thanks to eBPF.Alexei Starovoitov
Today, eBPF is everywhere. We often don’t realize it. Every Android smartphone runs eBPF, new security products are based on eBPF, and it’s hard to see infrastructure software coming out today that is not using eBPF. I’m incredibly grateful to the eBPF and kernel community to have had the privilege of being part of the journey from the early days of collaborating with PLUMgrid to founding Isovalent.