Lighthouses emit a powerful, regular light that informs incoming ships of potential obstacles and dangers. If a vessel stops seeing the signal, it knows immediately that something might be blocking or obstructing the passage. The faster the ships react to the missing signal, the more likely they can find an alternate, safe route.
The same principle applies to networking. We tend to rely on routing protocol heartbeats to act as the signals that indicate whether our destination is accessible. If several of them go missing in a row, it indicates the passage might be blocked. But the frequency at which they’re sent is, by default, high and it sometimes takes minutes for a device to realize that its path is degraded.
We need a more effective networking beacon to ensure packets are not swept away and are redirected to a safe course.
With Isovalent Enterprise for Cilium 1.16, you can now enable Bidirectional Forwarding Detection (BFD) and detect link or neighbor loss faster, forcing traffic to take an alternate path and greatly reducing downtime.
Introducing BFD
According to its initial RFC, BFD aims at the following:
The goal of Bidirectional Forwarding Detection (BFD) is to provide low-overhead, short-duration detection of failures in the path between adjacent forwarding engines, including the interfaces, data link(s), and, to the extent possible, the forwarding engines themselves.
RFC5880
In other words, BFD aims at detecting network failures in order to find an alternative path for packets to take. BFD is commonly used with dynamic routing protocols such as OSPF and BGP and is ideal for scenarios where you have multiple paths: by quickly identifying a lossy link or unresponsive peer, the routing table can be updated and traffic can switch over to a healthy path instead.
BFD works by establishing a peering session between two endpoints. It uses a keepalive system where each endpoint at a frequency (BFD Interval) sends heartbeats (BFD Echo packets) to verify that the path is still active. If one endpoint stops receiving these messages within a predefined time (Detection Time), it considers the path down and informs the routing protocol so that routing convergence can occur.
The BFD heartbeats work similarly to BGP KEEPALIVE messages. So why would we need BFD?
Firstly, the BGP timers are, by default, high (30 seconds for the keepalive interval and 90 seconds for the hold time). We previously explored in a Cilium BGP blog post how we can reduce these timers but even with the minimum values, you might still experience a ~10 second outage.
BFD’s lightweightness means it can detect outages faster – subsecond failure detection – without putting a strain on resources.
Let’s take a look at some of the benefits achievable with BFD in a demo environment.
Demo Environment
For this lab, we will once again use the excellent containerized networking platform containerlab, with the following:
Kubernetes cluster with 2 nodes running Isovalent Enterprise for Cilium 1.16
We have configured an external BGP (eBGP) session between a Cilium-managed Kubernetes cluster in Autonomous System (AS) 65001 with a Cisco CSR 1000V device in AS 65005. Cilium’s BGP Daemon is advertising the Pod CIDR network 10.244.0.0/24 to the CSR, to make our pods accessible to the rest of the network.
csr#show ip bgp summary BGP router identifier 172.5.0.1, local AS number 65005BGP table version is 18, main routing table version 181 network entries using 248 bytes of memory
1 path entries using 136 bytes of memory
1/1 BGP path/bestpath attribute entries using 288 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 696 total bytes of memory
BGP activity 8/7 prefixes, 9/8 paths, scan interval 60 secs
1 networks peaked at 13:10:59 Oct 302024 UTC (1d22h ago)
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
172.5.0.2 465001561800 00:01:08 1csr#show ip route bgpCodes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type1, N2 - OSPF NSSA external type2 E1 - OSPF external type1, E2 - OSPF external type2, m - OMP
n - NAT, Ni - NAT inside, No - NAT outside, Nd - NAT DIA
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
H - NHRP, G - NHRP registered, g - NHRP registration summary
o - ODR, P - periodic downloaded static route, l - LISP
a - application route
+ - replicated route, % - next hop override, p - overrides from PfR
& - replicated local route overrides by connected
Gateway of last resort is not set10.0.0.0/24 is subnetted, 1 subnets
B 10.244.0.0 [20/0] via 172.5.0.2, 00:01:15
Let’s now compare, without and with BFD, the observed behaviour when a link becomes degraded.
Without BFD
Using their default settings, the CSR and Cilium negotiate a 30-second BGP keepalive interval and a 90-second hold time – meaning they exchange keepalives every 30 seconds and if they don’t receive any over a period of 90 seconds, they will conclude that their peering session has failed.
csr#show ip bgp neighbors BGP neighbor is 172.5.0.2, remote AS 65001, external link BGP version 4, remote router ID 172.18.0.2
BGP state = Established, up for 00:29:39
Last read 00:00:09, last write 00:00:22, hold time is 90, keepalive interval is 30 seconds
It means that it could take well over a minute for the BGP peers to detect that the connection to its peer is unsuccessful:
root@nico-ubuntu:~# dateFri Nov 112:35:11 UTC 2024root@nico-ubuntu:~# containerlab tools netem set -n clab-bgp-bfd-csr -i eth1 --loss 100+-----------+-------+--------+-------------+-------------+
| Interface | Delay | Jitter | Packet Loss | Rate (kbit)|+-----------+-------+--------+-------------+-------------+
| eth1 | 0s | 0s |100.00% |0|+-----------+-------+--------+-------------+-------------+
As soon as we start dropping all packets on the interface on the CSR device, BGP keepalives start going missing but, because of the BGP’s holdtime interface, the session takes about 2 minutes to come down and back in Active state.
csr#show clock*12:35:18.995 UTC Fri Nov 12024csr#show ip bgp summary BGP router identifier 172.5.0.1, local AS number 65005BGP table version is 20, main routing table version 201 network entries using 248 bytes of memory
1 path entries using 136 bytes of memory
1/1 BGP path/bestpath attribute entries using 288 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 696 total bytes of memory
BGP activity 9/8 prefixes, 10/9 paths, scan interval 60 secs
1 networks peaked at 13:10:59 Oct 302024 UTC (1d23h ago)
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
172.5.0.2 46500112142000 00:04:36 1csr#show ip bgp summary BGP router identifier 172.5.0.1, local AS number 65005BGP table version is 20, main routing table version 201 network entries using 248 bytes of memory
1 path entries using 136 bytes of memory
1/1 BGP path/bestpath attribute entries using 288 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 696 total bytes of memory
BGP activity 9/8 prefixes, 10/9 paths, scan interval 60 secs
1 networks peaked at 13:10:59 Oct 302024 UTC (1d23h ago)
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
172.5.0.2 46500113182000 00:06:29 1csr#show ip bgp summary BGP router identifier 172.5.0.1, local AS number 65005BGP table version is 21, main routing table version 211 networks peaked at 13:10:59 Oct 302024 UTC (1d23h ago)
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
172.5.0.2 46500100100 00:00:11 Active
csr#show clock*12:37:30.880 UTC Fri Nov 12024
2 minutes of downtime is obviously not acceptable for anyone running mission critical applications. It’s possible to reduce the BGP timers (as described in the Cilium docs) but timers too aggressively low can cause some unwanted instability.
Let’s see what we can achieve with BFD instead.
With BFD
Let’s deploy BFD on our remote Cisco CSR end. We need to enable it on the interface and in the BGP configuration:
BFD first requires a BFD profile to be created. You can specify how often the BFD heartbeats are sent and received (note that the timers are negotiated between the peers when the BFD session is being established). Notice how BFD support is available for both IPv4 and IPv6.
After deploying the configuration, the BGP peering between the Cisco CSR and Cilium works as expected, with the CSR once again receiving the Pod CIDR:
csr#show ip bgp ipv4 unicast summary BGP router identifier 172.5.0.1, local AS number 65005BGP table version is 22, main routing table version 221 network entries using 248 bytes of memory
1 path entries using 136 bytes of memory
1/1 BGP path/bestpath attribute entries using 288 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 696 total bytes of memory
BGP activity 10/9 prefixes, 11/10 paths, scan interval 60 secs
1 networks peaked at 13:10:59 Oct 302024 UTC (2d00h ago)
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
172.5.0.2 46500110112200 00:03:35 1csr#csr#show bgp ipv4 unicast BGP table version is 22, local router ID is 172.5.0.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*>10.244.0.0/24 172.5.0.2 065001 i
On the CSR, we can see the established BFD sessions (there’s one for IPv4 and one for IPv6):
csr#show bfd neighbors
IPv4 Sessions
NeighAddr LD/RD RH/RS State Int
172.5.0.2 4097/1614084818 Up Up Gi2
IPv6 Sessions
NeighAddr LD/RD RH/RS State Int
B:5::2 1/1948284173 Up Up Gi2
On the Cilium side, we can validate that BFD is also successfully configured by logging onto the Cilium agent:
root@nico-ubuntu:~# kubectl exec -it -n kube-system cilium-r64cb -- cilium-dbg bfd peersDefaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), clean-cilium-state (init), install-cni-binaries (init)# PeerAddress Interface Discriminator RemDiscriminator State LastStateChange Multi RemMulti RxInt RemRxInt TxInt RemTxInt EchoRxInt RemEchoRxInt EchoTxInt Diagnostic RemDiagnosticb:2::1 net1 3209926287437407563 Up 48h56m17s 33 1s 300ms 310ms 300ms 0s 50ms 60ms No Diagnostic No Diagnostic
172.2.0.1 net1 20836031992072335566 Up 48h56m10s 33 1s 300ms 310ms 300ms 0s 50ms 60ms No Diagnostic No Diagnostic
b:5::1 net4 19482841731 Up 3m19s 33 340ms 340ms 340ms 340ms 90ms 0s 90ms No Diagnostic No Diagnostic
172.5.0.1 net4 16140848184097 Up 1m37s 33 1s 1s 340ms 1s 90ms 340ms 90ms No Diagnostic No Diagnostic
When capturing the traffic with tcpdump, we can see our BFP packets being exchanged – you can see in the BFD control packets some of the timers settings specified during the negotiation:
Let’s now introduce some loss again. BFD immediately notices the loss of echo packets and brings down the BGP session immediately:
root@nico-ubuntu:~# cilium bgp peersNode Local AS Peer AS Peer Address Session State Uptime Family Received Advertised
clab-bgp-bfd-control-plane 6500165002172.2.0.1 established 49h17m9s ipv4/unicast 116500165005172.5.0.1 established 17s ipv4/unicast 01root@nico-ubuntu:~# dateFri Nov 114:28:39 UTC 2024root@nico-ubuntu:~# containerlab tools netem set -n clab-bgp-bfd-csr -i eth1 --loss 100+-----------+-------+--------+-------------+-------------+
| Interface | Delay | Jitter | Packet Loss | Rate (kbit)|+-----------+-------+--------+-------------+-------------+
| eth1 | 0s | 0s |100.00% |0|+-----------+-------+--------+-------------+-------------+
root@nico-ubuntu:~# root@nico-ubuntu:~# cilium bgp peersNode Local AS Peer AS Peer Address Session State Uptime Family Received Advertised
clab-bgp-bfd-control-plane 6500165002172.2.0.1 established 49h17m41s ipv4/unicast 116500165005172.5.0.1 idle 0s ipv4/unicast 01root@nico-ubuntu:~# dateFri Nov 114:28:59 UTC 2024
With BFD immediately noticing the defective link and the BGP session going down, traffic can be re-routed via an alternative path and downtime can be greatly reduced.
Final Thoughts
Minimizing interruption to live traffic remains a priority for all infrastructure and platform engineers. BFD support in Isovalent Enterprise for Cilium provides another method to make platforms more robust and resilient in the event of a failure.
If you’d like to learn more, don’t hesitate to contact us to request a demo with our talented team of solution architects:
Request a Demo – Schedule a demo session with an Isovalent Solution Architect.
Nico Vibert is a Senior Staff Technical Marketing Engineer at Isovalent, the company behind the open-source cloud-native solution Cilium.
Prior to joining Isovalent, Nico worked in many different roles—operations and support, design and architecture, and technical pre-sales—at companies such as HashiCorp, VMware, and Cisco.
In his current role, Nico focuses primarily on creating content to make networking a more approachable field and regularly speaks at events like KubeCon, VMworld, and Cisco Live.
Nico has held over 15 networking certifications, including the Cisco Certified Internetwork Expert CCIE (# 22990).
Nico is now the Lead Subject Matter Expert on the Cilium Certified Associate (CCA) certification.
Outside of Isovalent, Nico is passionate about intentional diversity & inclusion initiatives and is Chief DEI Officer at the Open Technology organization OpenUK. You can find out more about him on his blog.
Learn how to connect your Kubernetes Clusters with your on-premises network using BGP.
As Kubernetes becomes more pervasive in on-premise environments, users increasingly have both traditional applications and Cloud Native applications in their environments.
In order to connect them together and allow outside access, a mechanism to integrate Kubernetes and the existing network infrastructure running BGP is needed. Cilium offers native support for BGP, exposing Kubernetes to the outside and all the while simplifying users’ deployments.