Alvin Toffler once famously said, “You’ve got to think about big things while you’re doing small things so that all the small things go in the right direction.” Zero Trust Security is a security model that assumes that all network traffic is potentially dangerous and should not be trusted by default, even if it originates from within the network perimeter. Instead, all traffic must be authenticated, authorized, and encrypted to ensure that only authorized users and devices can access sensitive data and applications.
Through the journey of this blog, you will see how Cilium implements a range of security features to enforce Zero Trust Security principles including:
- Network Segmentation
- Identity-based access control
- Application Layer Security
- Microsegmentation
Zero Trust Journey: Preparation
Travel preparation and understanding:
- The principles and components of the Zero Trust security model
- The Role of Identity, authentication, and Authorization in Zero Trust
- The key characteristics of a zero-trust architecture
To help us understand how zero trust works, let’s map it to a more familiar example, a family heading out for a vacation to a different country.
Travel Plan | Zero Trust in Action |
You need your passport as you are required to identify yourself at the airport with the airport staff. | Your passport is a proof of your identity. It is a means of authentication. It also has an expiration date to ensure your identity will be checked periodically. |
Your flight ticket is a personal document which will be checked by airport security and airline staff before being allowed access to the airport and boarding the plane. | The ticket is a document which will authorize you to access the airport and be allowed the airplane. The ticket will be checked alongside your passport to verify your identity. |
As you board the aircraft , you cannot enter the cockpit as it is a restricted area. You also need to authenticate with your passport and have a token (ticket) to authorize yourself. | Certain areas of the airport or aircraft are restricted areas. Similarly on your Kubernetes Cluster you use Network Policies to secure access to certain services. |
You are assigned a seat, class, and zone so that you don’t end up contending or fighting for the same slot with another passenger. | In Kubernetes Pods are running in a specific namespace. Namespaces provide a means of separation for running applications or tenants. Traffic between namespaces should be by default not allowed. |
You are departing from a specific gate and board the aircraft at a specific time. | Egress connectivity from a Kubernetes Cluster should only be allowed from a specific configurable source instead of allowing traffic from all Pods. This can be achieved by using Egress Gateway. |
Your luggage is owned by you and therefore being tagged with your name and destination information. This ensures your luggage can be checked and be collected by the right owner. Luggage is loaded to the right aircraft and being unloaded and delivered at the right luggage belt at the destination airport. | Deployments and Pods have metadata attached using labels. Services and Network Policies use label selectors to identify which Pods traffic needs to be route to or which Network Policies should be applied and what traffic from source to destination is being allowed being deployed in Kubernetes has metadata. |
Your ticket allows you to access certain areas of the airport. You may have access to a lounge but not to the crew for example. Even with a boarding pass you cannot access all areas. | Access to services are secured and observed based on the identity of the source. Traffic to certain services must be denied by default and allowed only for specific identities. |
You are not allowed to bring specific items in your carry-on luggage. The content of your carry-on luggage will be checked thoroughly and restricted items will be removed when found. Certain items such as fluids are only allowed with certain conditions. Such as the maximum allowed number of fluids content. | Network Policies restrict access from specific sources to destinations. Our Tetragon Runtime Security solution is able to also inspect specific processes or file access using Tracing Policies and will prevent unauthorized processes to start or block access to restricted files. |
The aircraft is an environment with specific rules. You are required to fasten your seatbelt on takeoff and landing. You must put your phone in flight mode. And you are not allowed to take pictures of the crew or passengers. | In a Zero Trust environment all connections and running workloads are being observed and monitored. Cilium’s Observability features such as Hubble is able to monitor all traffic and alert you on specific conditions. Tetragon is able to monitor for example file and processes and is able to prevent certain processes to start and secured files to be overwritten. |
Once you have arrived at your destination your identity will be checked again before being allowed through immigrations/customs. | Cilium’s Gateway API allows to route incoming HTTP requests based on hostname, header or path and forwards them to specific services. |
Assuming everything went right, your luggage was loaded on the right plane, and your identity is verified and allowed to enter the country, you can finally leave the airport and enjoy a great vacation. Woohoo!!!
Identifying the attack vector !!
At its core, zero trust moves authorization from “verify once at the perimeter” to “verify everywhere, every time.”
- In a zero-trust model, nothing is trusted.
- No application, database, service, or infrastructure is trusted by default.
- Everything and everyone must be audited and verified.
- When one piece of infrastructure or an application wants to speak with another, it communicates through a gateway (like a server or network device) which authenticates according to access policy, grants the least privilege, and routes packets accordingly.
How Traditional Security Models Fall
Limitations of traditional perimeter-based security models
With the dynamic and hybrid nature of the data centers and corporate networks spreading across multiple locations or cloud environments, there is no way one static perimeter control can secure all of them. Building a static perimeter for each data center/dynamic application environment is not operationally scalable. In addition, maintaining a consistent security policy implementation across each vendor and environment like VMware, AWS, Azure, and so on, is even more challenging. Finally, the dynamic nature of short-lived workloads moving across environments makes it impossible to maintain the security posture using the classic perimeter security approach.
Standard perimeters are complex, increase risk, are no longer compatible with today’s business models, and are limited by:
- Insecure communication by default
- Lack of an identity and access management mechanism that can tackle newer cloud-native identities like labels, tags, and namespaces.
- Firewall policy that operates at OSI L3-L4, but not L7 and, therefore, unable to inspect data packets or to make metadata-driven decisions
- The lack of a built-in certificate management mechanism needed to enforce mTLS between pods
- iptables is designed as a Firewall to secure Linux-based operating systems and is not designed for the dynamic and churn of container environments at scale. Additionally, iptables lack the ability to filter individual application protocol requests such as HTTP GET, POST, or DELETE. Iptables operate at Layer 3 and 4 only.
Mapping the Security Gaps in cloud-native to Principles
Applying zero trust to the Data and Control Plane Security:
This is critical. When not applying Zero Trust principles in both the data and control plane you are exposed to the risk of exploitable links in the chain of trust. Securing the Control Plane ensures attackers cannot generate policy and logic changes that enable horizontal, traversal, or other secondary attacks. Data Plane security guards against brute-force and lower-level attacks that overwhelm or perforate perimeter security (e.g., malformed queries or requests and DDoS). If you already have a Zero Trust mentality, you’ve likely added many rules applicable to Zero Trust at the Data Plane. However, at the control plane, Zero Trust is less likely to have been applied because (big surprise) platform engineers prefer the road of least resistance to get the job done and have expansive privileges to perform their tasks.
Access to every resource should be authenticated and authorized based on dynamic policy:
Service identity and end-user credentials are dynamically authenticated and authorized before any access is allowed. The dynamic context of the access request should be part of the access decision. When access is granted, it should be granted with the least privilege required.
Access to resources should be bounded in space:
The perimeter of trust around resources should be as small as possible—ideally zero. Access should be mediated by a policy enforcement point (PEP) in front of every resource that is capable of retrieving and enforcing access decisions. This should apply to all inbound, outbound, and service-to-service access as well as traffic in the north-south direction.
Access to resources should be bounded in time:
Authentication and authorization are bound to a short-lived session after which they must be re-established. This ensures that access decisions are made frequently and with the most recent context available.
Access to resources should be observable:
As much information as possible should be collected and used to improve security posture. This allows the integrity and security posture of all assets to be continuously monitored and policy enforcement continuously assured. Also, insights gained from observing should be fed back to improve policy.
Automate Proper handling of Certificates:
Managing certificates can be complex and can introduce higher operational costs for DevOps, security, and SRE teams. If your Kubernetes architecture has dozens of microservices, every service can only be effectively authenticated and secured if the automation of issuing certificates is automated.
Gateways to Manage Access to Services:
A gateway provides a single endpoint or URL for a service or application and then internally maps the requests to a group of internal microservices. e.g. Using an API Gateway allows you to implement rate limiting by which an API or a service is consumed.
Implementing Zero Trust with Cilium
The key features and benefits of Cilium for container networking and security:
- Identity-aware service to service security and observability
- Advanced network policies with native HTTP and DNS protocol support
- Transparent Encryption – Efficient datapath encryption using in-kernel IPsec or WireGuard
- Enforcement of TLS via Network Policy.- This allows operators to restrict the allowed TLS SNIs in their network and provide a more secure environment.
- Powerful security observability and real-time runtime enforcement platform with Tetragon.
- Cluster-wide network policies can provide security guardrails while having more specific network policies to secure the application.
- Operating System access control on different levels such as system calls, TCP/IP, file access or integrity, and namespaces.
Identity-aware service to service security and observability
Modern distributed applications rely on technologies such as containers to facilitate agility in the deployment of new versions of their application and to scale out on demand. This results in a large number of containers starting in a short period of time. Typical container firewalls secure workloads by filtering source IP addresses and destination IP addresses and ports. IP addresses in Kubernetes are ephemeral. Traditional firewalls are also not cloud-native aware and are mostly not capable of being programmed dynamically when applications scale out or new versions are deployed. Updating the Firewall constantly to adapt to the constant changes becomes impossible at scale.
To overcome this challenge, by providing identity-aware security and observability
- Each container with a unique set of labels (metadata) will be assigned a unique ID. A group of containers that have the same set of labels will have the same identity.
- The identity is then associated with all network packets sent by the containers, allowing validation of the identity at the receiving node.
- The identity is also used for observing all traffic between services and providing metrics of the performance between these services.
- Security identity management is performed using a key-value store.
How Cilium uses eBPF
By leveraging eBPF, Cilium gets the ability to insert security rules based on service/pods/container identity rather than an IP address for identification as in the traditional system. As a result, eBPF makes applying security policies in a dynamic container environment scalable by decoupling security from IP addressing, providing stronger security isolation, and adding the following functionality to the Kubernetes cluster.
Comprehensive Security at Layers 3, 4, and 7
Cilium offers networking policies that operate at layers 3, 4, and 7 of the OSI networking model. This ability to apply policies at multiple layers affords more flexibility in how you manage ingress and egress traffic within your Kubernetes cluster.
Importance of fine-grained access control policies
Cilium DNS-based policies provide an easy mechanism to specify access control while Cilium manages the harder aspects of tracking DNS to IP mapping.
- DNS-based policies are very useful for controlling access to services running outside the Kubernetes cluster.
- DNS acts as a persistent service identifier for both external services provided by AWS, Google, Twilio, Stripe, etc., and internal services such as database clusters running in private subnets outside Kubernetes.
- CIDR or IP-based policies are cumbersome and hard to maintain as the IPs associated with external services can change frequently.
TLS SNI
Server Name Indication (“SNI”) is an extension of the Transport Layer Security (TLS) protocol that allows for multiple domain names to be served by a single IP address. In the context of Kubernetes, this means that multiple Services can share the same IP address and still be able to terminate the client’s SSL/TLS connection and establish a secure connection between the client and the correct service.
Adding the field “ServerNames” to the Cilium Network Policy allows users to specify a list of allowed TLS SNI values. If the field is not empty, then TLS must be present and one of the provided SNIs must be indicated in the TLS handshake. This feature adds more granularity to your network security controls and allows you to enforce security policies based on the SNI value in the client’s connection request.
With the following policy, you can allow traffic to the amit.cilium.rocks SNI:
With Hubble, verify that traffic to this SNI is allowed…
… while traffic to google.com is dropped by the policy:
How Cilium provides better security against lateral movement attacks
Applying the correct policy at the appropriate source or destination is one of the biggest challenges that admins face today and Cilium offers that out of the box to users.
Runtime-aware Network Policy with Tetragon
Tetragon helps the platform and security team to solve:
Security Observability
- Observing application and system behavior such as process, syscall, file, and network activity
- Tracing namespace, privilege, and capability escalations
- File integrity monitoring
Runtime Enforcement
- Application of security policies to limit the privileges of applications and processes on a system (system calls, file access, network, kprobes)
Tetragon has been specifically built for Kubernetes and cloud-native infrastructure but can be run on any Linux system.
You are possibly familiar with Kubernetes NetworkPolicies that define the allowed and denied network communication for Kubernetes workloads. Oversimplified, these policies describe that pod A is allowed to talk to pod B or CIDR 10.0.0.0/8 but pod A is not allowed to talk to pod C or CIDR 20.1.1.1/32.
The granularity of these policies is at the level of a pod. It doesn’t matter whether it is app.js running in the pod creating the request or the attack.py script invoking curl. With Tetragon, extend Cilium’s Network Policy capabilities to include runtime context:
The above policy leads to a significantly better least privilege policy. It allows Frontend
pods to talk to Backend
pods but only if:
- the binary in the source pod is called
app.py
- the process in the source pod is running unprivileged
The example shown takes the binary name and privileged execution context into account but this concept can be extended to also be restricted based on additional parameters such as ensuring that the process is still namespaced, the UID/GID context, or even taking memory hashes of the executable into account.
Single unified policy across clusters
A cluster-wide network policy is a bundle of network security rules that can be applied to one or more clusters. Cluster-Wide Network Policy Rules is a construct that defines a grouping of network security rules that can then be applied cluster-wide. Cluster-wide policies are essential in various cases, such as:
- Automatically applying a default-deny policy to all namespaces as they’re created.
- Allowing requests to a baseline set of allowed destinations like kube-DNS, DNS destinations used by all apps, or known IP ranges.
- Reducing management overhead of network policies in high-scale environments.
Hands-on for Zero Trust with Cilium
Let’s understand this better with the above analogy where you have a couple that has decided to go on a vacation with other passengers. Some key considerations for the example:
- Economy class passengers- namespace “Economy”
- Business class passengers- namespace “Business”
- First class passengers- namespace “First”
- Services of the name- Frontend and backend which are applicable for each class.
- Passengers can access services provided to them only for that class<>Namespace
- The Isovalent Enterprise for Cilium version is installed with Hubble and Tetragon Enterprise.
1. Create 3 distinct namespaces
2. Create the services that each user is entitled to in that namespace
3. Create a Cilium Network Policy using the policy editor in the economy namespace
Use the network editor to explicitly allow the following communication patterns:
- Ingress from workloads in the same namespace (economy).
- Egress to workloads in the same namespace (economy).
- Egress from workloads in the namespace to KubeDNS/CoreDNS.
This is how our security policy will look like
Download this policy and apply it to see it in action
4. Apply the policy
Note– *Repeat the above steps for the other 3 namespaces as well*
5. Zero Trust Security policy in action
When an economy-class passenger requests a service that is within the same namespace it is allowed but other connections are denied.
6. Tetragon in action
Tetragon provides advanced security capabilities such as protocol enforcement, IP and port whitelisting, and automatic application-aware policy generation to protect against the most sophisticated threats. Built on eBPF, allows Tetragon to scale to meet the needs of the most demanding cloud-native environments with ease. Tetragon detects and is able to react to security-significant events, such as:
- I/O activity including network ( our use-case for economy class is highlighted below) & file access
- Process execution events
- System call activity
7. Zero Trust Security in runtime (Tetragon)
With Tetragon, Isovalent Enterprise for Cilium delivers robust and scalable security to protect against the most sophisticated threats in runtime
You can see that users in the economy could not access any service that is destined for business or first. Now, let’s say there is an emergency and an economy class passenger has to be given access to a service in business class.
- Using the policy editor click on the denied/dropped flows and add them to your policy.
- Users need to download the policy and apply/update the same again.
- Once the updated policy is applied, the user in economy class can now access services in business class.
Conclusion
Implementing the Zero Trust Security model in modern cloud-native environments is very significant: Zero Trust is a powerful security model that’s at the forefront of modern security practices. If you can cut through the marketing noise around it, there are some profound and important benefits to adopting zero trust. You can use Isovalent Enterprise for Cilium for Enhanced Zero Trust Security to:
- Provide additional security and networking capabilities of Isovalent Enterprise for Cilium
- Provide real-time monitoring and security analytics for Kubernetes environments
And while zero trust requires some radical changes to core ideas such as identity, Kubernetes users at least have an advantage if they are able to use security policies adopt a service mesh, and shift from purely perimeter-based network security to the “continual verification of each user, device, application, and transaction.”
Try it out
- Try the Zero Trust Visibility lab
- Try the Network Policies lab
- Try the Host Firewall lab
- Try out Cilium Network Policies
Amit Gupta is a senior technical marketing engineer at Isovalent, powering eBPF cloud-native networking and security. Amit has 21+ years of experience in Networking, Telecommunications, Cloud, Security, and Open-Source. He has previously worked with Motorola, Juniper, Avi Networks (acquired by VMware), and Prosimo. He is keen to learn and try out new technologies that aid in solving day-to-day problems for operators and customers.
He has worked in the Indian start-up ecosystem for a long time and helps new folks in that area outside of work. Amit is an avid runner and cyclist and also spends considerable time helping kids in orphanages.