Observability is about more than just operations and application monitoring, it’s about inferring the state of the environment based on the outputs from the overall system. True next-generation observability is the ability to ask questions of the system rather than just piling up monitoring data and attempting to correlate it.
The data most often linked to observability is metrics, logs, and traces. However, each of these data sources has different collection methods and may require many different products and agents. What if there was a way to gather telemetry data for observability in a way that is unobtrusive, secure, and consistent across systems, with less friction and minimal impact on performance and resource usage? Meet eBPF!
Collecting Observability Data with eBPF
eBPF is a powerful kernel technology that enables a whole new class of observability tools. eBPF programs can be attached to various kernel hooks to collect data about what’s happening. Doing this same level of discovery without eBPF would normally require either loading additional kernel modules or modifying the kernel itself, which could take years to accomplish. In addition, both of those activities add risk and resource performance overhead. eBPF can be utilized to collect a wide variety of data at multiple layers including process IDs, timestamps, system calls, and resource usage. An eBPF program is secured by a verification process to ensure it has the appropriate privilege, doesn’t crash or negatively impact the system, and runs to completion.
In this blog, we’ll take a look at what is needed for observability and some of the ways eBPF can supercharge network observability, Kubernetes observability, security observability, and performance observability. To wrap things up, we’ll discuss how eBPF can help with tracing at the end of this blog.
Using eBPF for Network Observability
Gathering network insights without significant overhead is a huge benefit of using eBPF. A good example is how Cilium uses eBPF for deep visibility into networking flows without needing agents or using significant system resources.
Using eBPF to access L3, L4, and L7 network flow data lets you observe detailed analytics about traffic. You’ll see what network data is being allowed or denied and what network policies or configurations are causing connectivity issues.
Without eBPF, the example above would have required digging through iptables logs, or deploying resource-heavy 3rd party agents and tools to track flow issues, and then trying to map those to policies and application processes.
Using eBPF for Kubernetes Observability
Kubernetes offers a lot of operational capabilities, but it also comes with a tradeoff of extra complexity. Using eBPF for Kubernetes observability is advantageous because of the intrinsic access to telemetry without the need (or overhead) of pushing agents or sidecars throughout the cluster. eBPF programs have access to pod-level network metrics and can natively understand not just IPs and ports, but service identities and API calls, and then expose the data to operational dashboards like Grafana. The same observable flow data and identity awareness lets users go one step further with dynamic network policies. Being able to gain a real-time understanding of what is happening with application-layer behaviors and then using the data for troubleshooting and active policy management wasn’t as easy before eBPF, due in large part to limitations of built-in tools like iptables.
eBPF can also enhance Kubernetes observability through resource utilization, again, without added overhead. Through eBPF, system resources are exposed for deeper understanding, and to guide other tools and processes such as placement, sizing, and general application and infrastructure optimization. Exposing these same system resources was not simple, and sometimes not possible, without the depth of granularity that utilizing eBPF with Kubernetes allows.
Using eBPF for Security Observability
eBPF gives users an intrinsic, deep visibility to communication flows across the entire system. The use-cases for security observability with eBPF are numerous, but include unlocking extremely granular process visibility, and end-to-end process and flow observation. Even in simple microservices applications running in Kubernetes, understanding the communication patterns in ephemeral, stateless environments typically requires deep agent-based access to the virtual and physical network. Add in L4-L7 traffic with multiple protocols, and you’ve got an ideal candidate for eBPF. With eBPF, TLS traffic traversing pods, nodes, and cloud platforms, can now have a common method to instrument across layers.
Projects like Tetragon and Hubble use eBPF to gather insights such as sudden changes to behaviors, microbursts of activity to a particular process indicating potential exploitation or attack, service-to-service communication patterns, and more. They also allow user scans to get a view of network transactions to know which pods, processes, and system calls, are involved. eBPF gives users access to 5-tuple information for deep understanding of UDP and TCP transactions in real-time with historical data.
Security and networking observability go hand in hand. Users can harness the capabilities of eBPF and API-driven infrastructure to create programmable controls and can create policies that can be observed or even acted upon in a real-time environment.
Using eBPF for Performance Observability
Resource performance observability is less discussed, but as applications become more diverse and move towards microservices and containerized cloud environments, the enhanced observability features provided by eBPF become more significant. Many performance optimization use-cases are still in early development, but progress in performance observability so far is promising and shows the popularity of the use case.
Examples of enhanced performance observability capabilities can include the ability to map pod-level network throughput, end-to-end network throughput and latency, per-process CPU and memory utilization, and more. eBPF will become a core tool for performance observability as we continue to look for ways to optimize both system and application performance at the narrowest point (process/container/pod) as well as by mapping end-to-end total system performance (node/cluster/cross-cluster/cross-cloud).
Using eBPF for Tracing
eBPF provides a unified tracing interface for both the userspace and kernel that does not require additional instrumentation of code. First, because the tracing happens in the kernel, eBPF tracing will continue even in cases where agent-based or sidecar tracing fails. Second, eBPF does not stop a running process to observe its state, which greatly helps maintain application runtime performance. eBPF tracing occurs at the kernel in real-time for any system resulting in less performance penalty and reduced risk to the running processes and applications.
The third advantage of using eBPF for tracing is that eBPF can trace everything in the system rather than being limited to specific layers or processes. No code needs to be injected into your application in order to use eBPF to observe the application processes.
Are there any disadvantages to eBPF?
Now that we’ve exhausted the ways eBPF can supercharge observability, we should point out any disadvantages of using eBPF for observability. While a common misconception with eBPF is that developers need to understand eBPF to take advantage of its capabilities, eBPF capabilities are built into the kernel and continue to grow in feature set and number of projects and applications that are leveraging it, without the need to create any eBPF code on the front end.
With that in mind, a few things may be considered disadvantages to using eBPF for observability. eBPF is included with every recent Linux kernel, and features have been added along the way which users can confirm based on the kernel versions they have in use. Windows support is limited at the time of writing, but has a growing interest, and will likely see significant improvements in the coming months. Another potential disadvantage is access limitations of sandboxed programs. While this same aspect can provide an advantage to using eBPF for observability, it may present a disadvantage for certain metrics which may require application-aware agents for a small subset of applications.
Conclusion
eBPF is an impressive tool to use for observability that enables deeper insights when compared to more traditional observability solutions. The advantages of being a secure, non-intrusive way to gather telemetry data from the entire system is something that has not been available in the past without many products, application-level agents, and very complex operations. One doesn’t need to deeply understand eBPF programming in order to gain the advantages of utilizing eBPF, either. eBPF is growing as a standard underpinning for observability, feeding data into tools such as Grafana, as well as technologies in Kubernetes and other systems, like those of many top cloud companies. eBPF is not the end goal, but the tool and method which enables users to reach the end goal of deep, intrinsic data access to create low-overhead observability for diverse application environments.
Through an incredibly active developer community, eBPF has an ever growing set of use-cases and continuously evolving capabilities. We, at Isovalent, are very proud of this growth and continuous feedback, and Isovalent is home to many of the creators and core contributors to the eBPF project. It’s an exciting time to be part of the journey. If you’re interested in seeing how eBPF can enhance your observability, schedule an eBPF observability demo with one of our eBPF experts or run one of our hands-on labs on eBPF!
Chris Lentricchia is a Senior Product Marketing Manager at Isovalent, the company behind Cilium. He resides in northern New England and enjoys the outdoors and playing with his dog, Buddy.