Author
About the speakerNico Vibert

Nico Vibert is a Senior Technical Marketing Engineer at Isovalent – the company behind the open-source cloud native solution Cilium. Nico has worked in many different roles – operations and support, design and architecture, technical pre-sales – at companies such as HashiCorp, VMware and Cisco. Nico’s focus is primarily on network, cloud and automation and he loves creating content and writing books. Nico regularly speaks at events, whether on a large scale such as VMworld, Cisco Live or at smaller forums such as VMware and AWS User Groups or virtual events such as HashiCorp HashiTalks. Outside of Isovalent, Nico’s passionate about intentional diversity & inclusion initiatives and is Chief DEI Officer at the Open Technology organization OpenUK. You can find out more about him on his blog.

Video: Getting Started with Cilium Monitoring with Grafana

In this video, Nico Vibert introduces monitoring key metrics of Cilium and Hubble, by leveraging Prometheus and Grafana.

Transcript

Welcome to a Cilium Flash episode on Cilium monitoring using Grafana and Prometheus. Grafana and Prometheus are very popular platforms to capture metrics and visualize data in cloud-native environments. In this short video, we’ll be using Grafana, Prometheus for Cilium. And why is that important? Well, Cilium gives you that networking connectivity across multi-cloud, and it’s in the datapath. It’s pretty critical, so you need to be able to measure and observe the health of your Cilium environment and also of the network flows of your Cilium-managed Kubernetes pods. Let’s get started by first deploying a Kubernetes cluster using Kind.

All this can be found on the official Cilium docs. If you look at the kind configuration cluster, it’s very simple. You can see you’ve got one control-plane node, three workers and we’ve disabled the default CNI before we install Cilium.

Let’s go ahead and deploy the cluster. It just takes a couple of minutes and we’ll just speed up for the sake of the demo.Our cluster is up and running. Let’s just check it’s working as expected.
Now, the nodes are currently NotReady and it’s because we haven’t deployed a CNI so that’s perfectly normal.

Now we’re ready to go and install Cilium. Again we’re using the official documents, you’ll see the links at the bottom of the screen. We’re installing Cilium version 1.12, in the namespace kube-system, we’re enabling Hubble and all the metrics and data collection for Cilium and Hubble. Let’s make sure it’s running correctly. Still being deployed. Let’s just do cilium status –wait and wait for cilium to come up. And we’re good to go. Cilium is up and running.

Now we’re ready to visualize the logs using Prometheus and Grafana. So, we’ve deployed our Prometheus and Grafana environment. Again, you can find this manifest in the official Cilium documentation. It takes a few minutes to deploy. It’s creating a service and now we’ll be accessing the Grafana dashboard as soon as the pods are up and running. Just take a couple of minutes. Prometheus is ready. Grafana is not quite…

Okay, now we just create a port forward and we’ll be able to go and access Grafana. But before we go and access Grafana, we’re going to run a Cilium connectivity test to generate some data and traffic with Cilium. Now, the connectivity test deploy is a deployment that will test different connectivity paths and scenarios between pods in the same nodes, different nodes, using network policies, using DNS queries, going out to the internet, etc. So, that takes a few minutes to execute, and you will see by the end, all the tests will be successful, and we can go and see the results in Grafana.

Now we’ve got our Grafana dashboard. As you can see, one of the data sources is a Prometheus time series. And automatically, we’ve got some dashboards – Cilium metrics, Cilium operator, and Hubble. Of course, Hubble is a platform to collect and aggregate network data from all our pods. And we can see all this network flow information on the dashboard. We can see the flow distribution, whether it’s TCP, UDP, and we can zoom in on a specific time again, using Grafana, whether traffic is DNS or HTTP. And then you can really zoom in on the specific moment in time for forensics purposes.

We also have the Cilium metrics dashboard where you can find out data such as CPU and memory usage per node. You can look at all the nodes or pick a specific one, like looking for information about the pod memory and map usage, and also about the API calls, which is a critical metric because the agent is event-driven. There are many more things you can go and explore. Again, this was just a getting started with Grafana and Prometheus. And look out for more observability videos to be published in the near future. Thank you.