Nikolay Aleksandrov is a senior software engineer at Isovalent and a co-maintainer of the Linux bridge driver. He has experience with network operating systems development and is a long-term Linux kernel networking contributor.
Video: BBR Support for Pods
Tune in to our experts Nikolay Aleksandrov (speaker) and Daniel Borkmann comparing BBR-based congestion control to Linux' default CUBIC for Pods. The BBR-based congestion control for Pods has been added in Cilium 1.12 as a new feature for Cilium's Bandwidth Manager and for the first time enables Pods to use BBR in practice. Using a real-world adaptive video streaming use case they will compare two different network conditions - high-speed long-haul links with large BDP and last mile networks at the edge of Internet - and discuss the results.
Hello, today I’ll present a demo of Cilium’s new BBR feature, which enables you to use the BBR congestion control algorithm. I’ll compare BBR to CUBIC, which is Linux’s current default congestion control algorithm, by showing a real-world adaptive video streaming use case consisting of a service which uses FFmpeg to produce HLS playlists and video chunks for two video profiles – HD and low – with different bandwidths. It also uses the NGINX web server to serve a web page with a video player, the playlists, and the video chunks. The service backend is in its own pod, and it’s interesting to note that it was only recently that Linux kernel advancements made it possible to use BBR from Kubernetes pods. Before that, none of the CNI plugins could do it.
I’ll show two cases with different network conditions: the first with 100 millisecond latency only, which means a large bandwidth-delay product, and the second with 100 millisecond latency, 5 megabit rate limiting, and 1% random packet drop. I have two Kubernetes clusters set up – one with CUBIC, the default congestion control algorithm, which is on the left side, and one with BBR congestion control algorithm enabled, which is on the right side.
You can follow the TCP connections metrics above the respective browser windows for each congestion control algorithm. You can also see and compare the HTTP request latencies for each algorithm in the developer console in the browsers. The client setup uses a separate network connection to each cluster, and Linux’s netemq disk to emulate the same network conditions. The test is being performed on a host with Kubernetes clusters running on virtual machines. Now, on the left shell, I will show you the current configurations for each cluster, and we’ll start with the 100 millisecond latency case.
This is the BBR config cluster, where you can see BBR is enabled. And this is the CUBIC cluster, where BBR is disabled and we use CUBIC. You can also see it in the connection windows above the browsers.
So now, we’ll start with the left, with the 100 millisecond latency case. I will open both – this is CUBIC, and this is BBR. Let’s start. You can clearly see the latencies here and here – BBR is almost five times better than CUBIC. It’s utilizing the bandwidth very well, and latency is much lower. You can monitor the connection metrics up here.
Okay, now I will close these windows and we’ll switch to the second test, which is 100 millisecond latency, 5 megabits rate limiting, and 1% random packet drop. I’ll use new tabs for it. Okay, now I’ll clear the connections to make sure we’re using new TCP connections for it. And I will switch the configuration here. Okay, now I’ll start the new test. Again, here is CUBIC, and here will be BBR. You can notice that CUBIC is performing very badly and it cannot even hold the video profile. So it’s switching to low with lower bandwidth, and it’s even barely holding on to the low profile. While BBR is performing very well and it can hold the HD profile and play the video with much better quality. Here you can see it’s HD, here you can see it’s the low profile. It can also be seen from the playlists.
BBR utilizes the bandwidth much better. That’s all. Thank you.