What is Kube-Proxy and why move from iptables to eBPF?
Picture kube-proxy as the conductor, using iptables as its orchestra, overseeing a complex symphony of Kubernetes networking. However, as the orchestra grows and the music becomes more intricate, iptables starts to miss a beat. In this blog we untangle the roles of kube-proxy and iptables in Kubernetes networking, and discuss why the Linux Kernel community views eBPF as the modernized choice to move beyond the technical debt of iptables.
To set the stage, we begin with what is kube-proxy and the use of iptables within Kubernetes networking for packet forwarding, load balancing, and service abstraction. Next we evaluate benchmarking metrics of iptables versus eBPF, comparing latency and CPU overhead as services grow in size and complexity. Then learn how we bring eBPF into your deployment using Cilium, the industry standard for Kubernetes networking. And a final note concludes this piece with the noteworthy operational benefits of migrating to eBPF.
Here for a specific answer? Skip to the section below:
- What is kube-proxy and how does it use iptables?
- Why is eBPF the standard for Kubernetes networking?
- How to replace iptables and bring eBPF into Kubernetes?
- What are the operational and business benefits of eBPF?
- Conclusion and Sandboxed Labs
What is kube-proxy, and How Does it Use iptables?
The Linux kernel community has long been looking to incorporate eBPF into the heart of networking use cases. In the last decade, eBPF has gone from an interesting technology with potential, to widely adopted with tangible improvements across Kubernetes networking, observability, and security. Iptables is a legacy technology that offers a perfect use case for modernizing with eBPF, with the Linux community (and even Rusty Russel) migrating away from iptables and towards eBPF.
Let’s start by understanding what is kube-proxy and how it utilizes iptables within Kubernetes networking.
Kube-proxy is the networking component in Kubernetes installed on each node in a cluster, responsible for facilitating communication between services and pods. Its primary role is maintaining network rules for service-to-pod mapping, allowing communication to and from Kubernetes clusters. Kube-proxy acts as an L3/L4 network proxy and load balancer, leveraging iptables or IPVS (IP Virtual Server), with iptables as the out-of-the-box default.
Looking at the diagram above, Kube-proxy is installed on each node and ensures network traffic flows from services to the appropriate pods. When a new service or endpoint is created, it abstracts the underlying pod IPs, allowing other services to communicate with it using the service’s virtual IP address.
As the name suggests, kube-proxy acts as a network proxy. It manages network rules necessary for routing traffic, including Network Address Translation (NAT) when required, to ensure packets reach their intended destinations.
For services with multiple pods for high availability or scalability, kube-proxy acts as the load balancer. It distributes incoming traffic among the pods to ensure balanced workloads and efficient resource utilization.
So what is iptables? Iptables was built over two decades ago as a packet filter and firewall tool within the Linux kernel. It analyzes, modifies, and filters network packets on a set of user-defined rules based on IP addresses or port-based rules (TCP/UDP).
Consider iptables as a set of kernel modules to filter or mangle packets. These modules attach their callbacks to netfilter hooks to execute a set of rules. In the early days of less stringent performance demands, iptables worked dutifully as the default network security and traffic management tool with its hundreds (or thousands) of rules inspecting and redirecting packets.
Iptables rules are organized into iptables tables and iptables chains. An iptables chain is the ordered list of rules that is evaluated sequentially when a packet traverses the chain. An iptables table is a way to group together chains of rules, iptables has five tables covering Filter, NAT, Mangle, Raw, and Security.
Rule chains in iptables are formed with basic INPUT, OUTPUT, and FORWARD hooks. For instance, to block incoming traffic on port 80, we apply the following INPUT chain that matches on packets using the TCP protocol with destination port 80 and drops these packets:
# drop incoming traffic on port 80 iptables -A INPUT -p tcp --dport 80 -j DROP
In smaller deployments, iptables works fine managing this communication between services and pods. However, as clusters scale in size and complexity, the limitations of kube-proxy’s iptables-based approach become evident in latency and performance, with O(n) sequential operations. This architectural design significantly impacts the overhead required for kube-proxy to manage even a small cluster with one hundred services – which already hints at why we might want to replace iptables with something like eBPF.
Performance in iptables relies on a sequential algorithm, going through each rule one-by-one in the table to match rules against observed traffic. This means that the time taken scales linearly with each added rule, and performance strain quickly becomes apparent as more services and endpoints are added. A packet has to traverse each rule to find a match, introducing latency and causing problems with stability.
Debugging iptables is frustrating, with hundreds to thousands of rules to debug and update it’s near impossible to troubleshoot larger systems in real-time. When even a single rule needs modification, the entire iptables rule table is recompiled and updated. This time-consuming process hampers rapid response to network issues and leads to extended downtime – especially in fast-changing Kubernetes deployments (think multiple hours when dealing with thousands of unique rule chains!).
Iptables, designed more for basic firewall purposes, was never intended for the complexity of networking at Kubernetes scale. The performance issues of iptables become harder to gloss over moving forward from network requirements of the early 2000’s and into the current era of Kubernetes. The ephemeral nature of Kubernetes resources means new ip addresses are created, dropped, and recycled in a matter of seconds. The community has looked to improve on iptables over the years, with ipset and, more recently, IPVS. But IPVS serves to bandage over the issues of iptables rather than offer a true panacea.
IPVS, while effective for basic load balancing, struggles with limited customization options, complexity at scale, and a lack of advanced observability features. The performance improvements of IPVS over iptables are still lower than eBPF, and without offering the improved flexibility that eBPF programs allow. As we will explore in the next sections, the right cloud native solution must blend scalability, reliability, and programmability to reliably handle Kubernetes dynamic identities and resources.
Like a Glove: Why is eBPF the Standard for Kubernetes Networking?
Enter eBPF, the technology described by Brendan Gregg as “superpowers” for the Linux Kernel. For the uninitiated, eBPF is a versatile and efficient technology that has rapidly gained traction in Linux networking. It allows for programmable processing directly within the Linux kernel, enabling a wide range of network, observability, and security-related tasks with highly efficient speed and flexibility. The range of use cases was on full display during the recent eBPF summit 2023.
Rusty Russel, the creator of iptables, has recognized eBPF as the long-awaited upgrade to iptables: “iptables perf used to be “mostly good enough”. Replacing it has taken so long because it requires a radically different approach; nice to see it finally happening!”.
For load balancing, iptables was never designed for highly scaled operations like we see today with Kubernetes. The sequential rule matching and stiff IP based rules struggle with frequently changing IP addresses.
Instead, eBPF uses efficient hash tables allowing for almost unlimited scale. In the graphic below, the lower the latency, the better the performance. Looking deeper at the performance benchmarking, iptables struggles with latency as the number of microservices grows, meanwhile the efficiency of eBPF’s near O(log n) time complexity is on display, showing its capabilities to replace kube-proxy and iptables.
By using hash tables, the operational burden on eBPF remains reliably stable, despite the number of services within a cluster increasing substantially.
Let’s bring IPVS into the testing. The below benchmarking measures latency for the following TCP connect/request/response (CRR): open a TCP connection, send a payload, return the payload, and close the connection. The results are that eBPF significantly outperformed kube-proxy in iptables mode, with this becoming clear as services scale. Furthermore, eBPF outperformed kube-proxy in IPVS mode.
The benchmark above was performed on bare metal machines on AWS, connected with 100 Gigabit links, running the 5.3 Linux Kernel. Watch Martynas Pumputis talk to learn more about the benchmarking and dive deeper into the limitations of iptables.
Latency is just one part of the picture, let’s evaluate the %CPU cost and understand high-throughput impact on resource utilization.
The below tests evaluated iptables (Cilium 1.9.6 legacy host-routing and Calico 3.17.3), eBPF (Cilium eBPF 1.9.6 and Calico eBPF 3.17.3), and node-to-node baselines. This benchmarking used 32 streams to test high-throughput traffic at a whopping 100Gbit/s, pushing the network’s ability to handle simultaneous connections and reveal potential bottlenecks around resource limitations. The lower the CPU % utilized, the better. The results show eBPF as the clear winner, with the lower CPU % in both the Cilium and Calico eBPF deployments significantly outperforming iptables-based legacy routing.
To read more performance benchmarking between eBPF, node-to-node bare metal machines, and legacy routing with iptables, extensive testing has evaluated eBPF against iptables in the context of Cilium CNI performance.
How to Replace iptables and Bring eBPF into Kubernetes?
In the case of replacing iptables with a modernized solution, the answer has been through moving beyond IPVS and bringing eBPF into the networking layer. For the Linux Kernel community, it was always clear that eBPF improved on iptables, and in this post we shared benchmarks around the different ways eBPF outperforms iptables and IPVS.
Currently, the industry standard for bringing eBPF into Kubernetes (and by extension replacing kube-proxy) is through Cilium. Cilium is the full replacement for kube-proxy, and much more. Currently, Cilium is the only CNI at the CNCF that has officially graduated! Major cloud providers have migrated away from kube-proxy and use Cilium as their recommended CNI, including Azure, AWS, and Google Cloud.
Over the past 5 years, Cilium has emerged as the accepted choice for introducing eBPF into Kubernetes environment, with teams continuously choosing Cilium to modernize beyond iptables. Cilium is easy to install and operate, the kube-proxy replacement guide shows the simplicity of running as a CNI plugin without kube-proxy, replacing iptables with eBPF in the background.
The Cilium dataplane offers a comprehensive replacement for kube-proxy, making it easy to transition from iptables to eBPF programs. On the right, Cilium installs eBPF and XDP (eXpress Data Path) programs on each Kubernetes node, bypassing the overhead from iptables. This approach minimizes overhead and context-switching requirements, allowing for efficient packet processing that leads to the lower latency and CPU overhead.
What are the Operational and Business Benefits of eBPF?
The transition to eBPF is not just a technical upgrade; it’s a strategic move. eBPF’s efficiency in handling large-scale operations, such as load balancing, has become evident. With its use of efficient hash tables, eBPF can scale almost limitlessly, reducing latency and overhead.
Over time, the motivation for migrating from iptables to eBPF through Cilium has evolved into the business and operational realm. From an operational perspective, the shift to eBPF offers tangible benefits. Improved application performance, simplified network operations, and enhanced security translate into cost savings and better resource utilization. Reduced latency directly impacts the user experience.
Additionally, migrating from iptables reduces the maintenance overhead, and coupled with eBPF’s streamlined troubleshooting brings minimal downtime and faster response times. The compute benefits from reduced latency has brought tangible value and improved application experience to platform teams.
Conclusion and Sandboxed Labs
This transition from iptables to eBPF is a journey over the last decade that marks a significant shift in how we handle packet forwarding, load balancing, and service abstraction within Kubernetes clusters. iptables, while a reliable technology, was not designed to handle the dynamic and expansive networking needs of modern Kubernetes deployments.
Migrating from kube-proxy and iptables to eBPF represents an exciting leap forward in Kubernetes networking. It’s not just a technical upgrade but a strategic shift that aligns with the demands of modern, dynamic, and large-scale Kubernetes environments.
Replacing iptables with eBPF is easy with Cilium, as the industry standard for Kubernetes networking. Cilium offers improved performance, scalability, and operational efficiency, making it the logical choice for modernizing Kubernetes infrastructure. The following labs will get you started on the learning journey from iptables, to eBPF, and beyond.
🚀 Try Cilium for yourself, explore the ‘Getting Started with Cilium’ lab.
🐝 Looking to get hands on with eBPF, ‘Learning eBPF Tutorial’ is the perfect sandbox.
💥 Curious about CNI migration tasks, ‘Migrating to Cilium’ walks through executing the migration safely.