Application Team Troubleshooting & Policy Workflows

Self-service tooling for managing Kubernetes monitoring,
troubleshooting, and security workflows.

Visibility into network connectivity behavior is critical to running production-grade services. However, Kubernetes provides little visibility into network behavior. Traditional IP-based network monitoring tools are also ineffective as ephemeral Pod IPs do not identify the services that are impacted, and such traditional monitoring tools have no ability to restrict an application team’s view of this data to only the data relevant to their application. The end result is that Kubernetes platform teams are often pulled in to assist with such tasks. Cilium Enterprise provides application teams with simple self-service tools for managing monitoring, troubleshooting, and security workflows in Kubernetes.

Cilium Enterprise Capabilities

Multi-tenant Connectivity Data + Metrics

Cilium Enterprise speeds up the investigation of application layer issues and alerts by giving each application team self-service access to rich streams of data about the health of connectivity between their services. This data is critical in solving the classic “finger-pointing problem” between application and infrastructure operations teams by identifying if a network layer issue (e.g., DNS lookup failure, network policy drop, TCP layer connection failures/resets) is the likely root cause of the higher-layer alert, or if the problem is likely isolated to the application layer. Cilium Enterprise leverages the OpenID Connect (OIDC) standard to securely give application tenants access only to the connectivity data associated with their Kubernetes namespaces.

Image 1

Historical Data Views + Analytics

When troubleshooting a connectivity issue as a platform or application team, having as much historical context as possible is critical to quickly resolving the incident. For example, comparing connectivity behavior to a baseline from before the incident, or identifying the exact timing of intermittent errors or faults that happened several hours ago. Cilium Enterprise leverages standard cloud storage APIs to store the Hubble flow data-stream and enables later querying and analytics on this data using the same Hubble API, CLI and UI as is available for live data. Cilium also annotates flow data with additional metadata, such as the details about policies that were applied when a flow was allowed or denied, that further simplifies troubleshooting.

Image 2

Simplified Network Policy Creation

Achieving zero-trust network connectivity via Kubernetes Network Policy is complex as modern applications have many service dependencies (downstream APIs, databases, authentication services, etc.). With the “default deny” model, a missed dependency leads to a broken application. Moreover, the YAML syntax of Network Policy is often difficult for newcomers to understand. This makes writing policies and understanding their expected behavior (once deployed) challenging.

Cilium Enterprise provides tooling to simplify and automate the creation of Network Policy based on labels and DNS-aware data from Cilium Hubble. APIs enable integration into CI/CD workflows while visualizations help teams understand the expected behavior of a given policy. Collectively, these capabilities dramatically reduce the barrier to entry to creating Network Policies and the ongoing overhead of maintaining them as applications evolve.

Image 3

Automated Network Policy Approvals

Because each application has a unique set of service dependencies that must be identified to create a Network Policy, security teams often delegate the ownership of network policy creation to application teams who are in the best position to service dependencies and how they evolve.

Cilium Enterprise allows security teams to delegate this responsibility to the application team while still providing high-level guidelines on what policies are or are not acceptable in terms of security compliance. Security teams can specify high-level properties (e.g., no applications can have unrestricted access to the Internet) and application teams receive feedback when they are crafting policies that violate these requirements. This process is entirely automated, saving time for both security and application teams and enabling Network Policy to integrate into application teams CI/CD workflows.

Image 4