Nico Vibert is a Senior Technical Marketing Engineer at Isovalent – the company behind the open-source cloud native solution Cilium. Nico has worked in many different roles – operations and support, design and architecture, technical pre-sales – at companies such as HashiCorp, VMware and Cisco. Nico’s focus is primarily on network, cloud and automation and he loves creating content and writing books. Nico regularly speaks at events, whether on a large scale such as VMworld, Cisco Live or at smaller forums such as VMware and AWS User Groups or virtual events such as HashiCorp HashiTalks. Outside of Isovalent, Nico’s passionate about intentional diversity & inclusion initiatives and is Chief DEI Officer at the Open Technology organization OpenUK.
Cilium Custom BGP Timers
[07:43] In this video, join Nico Vibert as he teaches you how to customize BGP timers using Cilium 1.14 !
Welcome to this video on Cilium BGP timers. It’s a new feature introduced in Cilium 1.14, and it allows you to customize the timers in your BGP session between Cilium and your peers. We introduced BGP support a few versions ago, and it lets you advertise the Pod IP ranges or service IP ranges from Cilium managed clusters to outside of the cluster using BGP. By default, we are using high BGP timers.
In BGP, you have different timers, including the Keepalive timers, which are used to maintain the neighbor relationship. If you don’t receive a Keepalive message during the Hold timer, then the BGP session is determined to have failed. By default, we’re using a 30-second Keepalive interval, so every 30 seconds, we send a Keepalive. If you don’t hear any Keepalives from your peers after 180 seconds by default, then the session pairs. This time is quite high, and ideally, we might want to reduce them because it might take up to a minute and a half to realize that your neighbor has become unresponsive. By reducing the timers, you can detect an outage faster and converge faster from an event or an outage. We’re going to look at this in the demo.
So, let’s have a look at this environment. We’ve got Cilium configured, and we have one peer. I’m reading Cilium’s CNI to check the BGP session with Cilium, and I’ve got one device, let’s call it my top-of-rack switch typically. The session is established; it’s been up and running for 22 minutes, and we are receiving one network and advertising one network. We’re going to log on to our upper peer, and we’re going to show BGP commands to show that we are receiving this network from Cilium, 10.0.0/24.
Now, let’s find a bit more information here. We can actually see the timers. The Keepalive interval is 30 seconds, as I mentioned, and every 30 seconds, we are sending a new Keepalive. The Hold time is 180 seconds. Again, we can reduce these timers, which will enable us to detect an outage faster and recover from the outage faster.
The way we define the BGP configuration in Cilium is using BGP peering policy. Let’s have a look at the peering policy. It’s a very simple configuration where we essentially define on which node we’re going to apply BGP. We specify our own local AS number, whether we want to advertise Pod CIDR to our peers, and then we have some settings like peer IP address and peer AS number. We are going to modify the BGP timers. So, we’re going to go from 30 seconds to 9 seconds and 12 seconds. But before we do that, let’s go and just capture the traffic to make sure that these timers are sent every 30 seconds.
So, let’s go on our Cilium agent, and we’re going to run a TCPdump to capture this packet. We’re going to capture the BGP traffic, which is port 179, from the Cilium node. So, we’re going to capture this and show that traffic is being sent; the Keepalives are been sent every 30 seconds.
Now, what we’ll do is we’ll save these policies, apply the peering policy, and we’ll see that the Keepalives are being sent every 3 seconds instead.
So let’s stop this packet capture and let’s use termshark to visualize our keepalives. So you can see I’ve got a keep alive message sent at 0 second and you can see that the 4th packet was sent 30 seconds later.
That keepalive message was sent every 30 seconds.
While we change the timers, the session reinitialize itself and you can see the session becomes active.
And very soon after, the session is re-established and you can see that the keepalives have gone down to 3 seconds.
But again let’s just verify using a packet capture.
We’re going to, this time, save it into a file called fast-timers. And again because we’ve reduced the timers, this should just take a few seconds for us to see multiple keepalives.
Here we go.
As you can see, if you look at our keepalive messages, they’re sent every 3 seconds, which again is a way for us to understand that our peers are still healthy and let our peers know we are still healthy.
And that’s it for this quick demo! It’s just a new feature. We’re going to look at more BGP features in the next few demos. Thanks very much for watching.