Technical Lead, Technology Exploration

April 14, 2023

23 min. read

Kubernetes Cluster I/O Is a Big Mess – But Help Is on the Way

The proliferation of interfaces, APIs, and abstractions for Kubernetes ingress and egress has led to various challenges in the world of container orchestration.

There’s no other way to describe the vast proliferation of interfaces and abstractions for controlling network traffic ingressing to and egressing from, also called inputs and outputs (I/O), in Kubernetes clusters. It’s a big mess.

The good news is that the community is aware of this and is doing work to make things better.

In this blog, we will discuss proliferation and the efforts being made to simplify the landscape.

How did we get here? A brief history of Kubernetes cluster I/O

In the beginning, there was only one official upstream ingress control resource for Kubernetes known simply as “ingress.” It was simple and had minimal features which lead to the creation and deployment of several other ingress controller resources with different features and APIs for interacting with them.

The original Kubernetes ingress resource is currently in the process of being deprecated in favor of a newer gateway resource and API that have been developed in the Kubernetes SIG Network working group specifically to address the proliferation of similar, but different, implementations of ingress features.

API gateways and service meshes share ingress functionality

As API management solutions migrated to the cloud and Kubernetes solutions in the form of API gateways, another control was added that is functionally an ingress controller. In addition to the dozen or so Kubernetes ingress controllers, there are a dozen or so Kubernetes API gateways that add another dimension of complexity and confusion to Kubernetes users.

And then there are the many different service mesh implementations and APIs that are effectively another ingress interface (into the mesh network implemented by the distributed proxies). All the same functional needs that comprise ingress controllers and API gateways are required to control traffic in and out of service mesh gateways where cluster I/O occurs in many production networks.

To summarize, the current state of interface and API proliferation around cluster IO is the sum of all these different implementations in all the different categories of solutions.

The downsides of proliferation

There are two major downsides to this proliferation:

The rapid growth of interfaces and APIs has resulted in an increased attack surface area, with API vulnerabilities becoming more prevalent.
The vast number of available solutions for ingress controllers, API gateways, and service mesh functionality creates confusion and complications for end-users. This has led to an environment where vendors and users must speak multiple "languages" to provide comprehensive Kubernetes support for security policy.

As more solutions emerge in the Kubernetes ecosystem, the functionality from the various ingress and egress categories is increasingly overlapping. This overlap creates confusion for people choosing tools and adds complexity to an already challenging landscape.

Why the complex Kubernetes ecosystem needs policy standardization

The Container Network Interface (CNI) defines the API for sending intra-cluster network traffic between pods, and there are a number of popular interoperable implementations, including OVN, Calico, Cilium, etc. Although there are some unique extensions in the different products, they share a common core of network policy capabilities that allow platform operators to specify which network-enabled entities can communicate and how.

Network policies are optimized to provide a default-deny environment where allow rules are exceptions to that behavior based on identifying traffic based on labels, namespaces, deployments, and other cloud native metadata attributes. These are exactly the type of primitive functions that would be a good foundation for the filtering of traffic ingressing to and egressing from Kubernetes clusters. However, the CNI doesn’t have extra-cluster scope, and therefore there has been no sharing of this standardized approach in the world of ingress controllers and API gateways.

The service meshes tend to have similar traffic filtering policy tools that don’t have a standardized approach compared to the network policy defined for CNIs. Service mesh also introduces Layer 7 filtering and allowlists which weren’t considered in-scope for CNI APIs and hasn’t yet seen progress on adoption in the CNI working group.

Standardization efforts by the Kubernetes community

To address these issues, groups are taking on various initiatives to standardize ingress and egress interfaces and APIs. These include several important efforts under the leadership of the Kubernetes Network Special Interest Group (SIG), including the the Network Policy Working Group, the Gateway Working Group, and the GAMMA Initiative.

Gateway Working Group

The Gateway Working Group is responsible for developing a unified API for managing ingress and egress traffic in Kubernetes clusters. The group's main project is the Kubernetes Gateway API which is designed to provide a more flexible and expressive API for configuring Kubernetes ingress and egress traffic6]]. By offering a standardized API, the Gateway Working Group aims to simplify the process of deploying and managing Kubernetes networking components.

By offering a standardized API, the Gateway Working Group aims to simplify the process of deploying and managing Kubernetes networking components.

Kubernetes Gateway API V1.0

The Kubernetes Gateway API is designed to address some of the limitations and issues associated with the original ingress resource. These enhancements address the limitations of the original ingress resource and provide a more efficient and user-friendly approach to managing network traffic in Kubernetes environments.

To learn more about the group's key improvements, access these resources:

GAMMA Initiative

The GAMMA (Gateway API, Mesh, and Middleware) Initiative is a collaborative effort between various Kubernetes SIGs and industry stakeholders. Its goal is to consolidate and standardize the APIs and interfaces used for Kubernetes ingress and egress traffic. This initiative aims to reduce confusion and complexity for end users, making it easier to deploy and manage Kubernetes networking components.

Network Policy Working Group

The Network Policy Working Group focuses on defining and implementing network policies for Kubernetes to enhance security and isolation between pods, services, and other network entities in a Kubernetes cluster. It currently supports a rich set of tools for specifying network traffic.It is widely implemented by popular CNIs, and therefore is not a tool that is applied to cluster ingress/egress traffic.

The group is currently working on several projects:

Administrative Network Policy: Provides cluster administrators with more control over network policies by introducing a higher level of abstraction. This enables administrators to define global, cluster-wide policies that can be applied consistently across namespaces.
Network Policy V2: Addresses limitations in the current network policy implementation by introducing new features and extending the existing API, such as support for egress traffic filtering, enhanced policy matching capabilities, and improved policy enforcement for better security.

NetworkPolicy++: Introducing advanced network policy capabilities by extending the existing Network Policy API. This provides more granular control over traffic management, security, and isolation, enabling users to create sophisticated policies tailored to their specific needs.

Community adoption is replacing standards organizations

Earlier in this blog, there are references to efforts to standardize abstractions and APIs, but that is not necessarily an endorsement for doing so via traditional standards organizations such as IETF, ITU, IEEE, etc. Open-source communities vote with their developer’s time and their code base, so achieving de-facto “standardization” because of widespread community deployment is the most important measure of success.

The introduction of the Kubernetes Gateway API, and deprecating of the ingress resource, is an example of a community dedicated to improving their infrastructure platform coming together to make widespread changes without gaining any competitive advantage from that investment.

At publishing time for this blog, there were 19 open-source ingress controller and service mesh projects in various stages of developing their gateway API implementation to replace their previous bespoke implementation. The majority of these are currently in beta release and several are in general availability (GA).

Fast, shared implementation is the new way to standardize software interfaces at the speed of community development. The work being done in the Network SIG is not academic work; the community has shown a willingness to contribute to and subsequently adopt the common interfaces and APIs defined in the working groups. Anyone can participate and contribute as they choose.

Still room for improvement?

The work currently underway within the Network SIG will clean up much of the proliferation mess that currently exists relative to cluster I/O. However, there are other dimensions of confusion and complexity that have not been targeted for alignment by the community.

The work of the GAMMA Initiative to share ingress features and APIs with the work of the gateway API work group goes a long way towards recognizing that service mesh functional requirements can look very similar to those of a CNI, where traditional ingress occurs for non-service-mesh.

Despite this work, there continues to be functional overlap between CNI and service-mesh that is not being aligned. In the early days, the CNI implemented layer network policies to filter traffic at Layers 3 and 4 and service mesh exclusively filtered a subset of that traffic by looking only at Layer 7 protocol elements.

The network policy working group is evolving and standardizing the API that will be adopted by all the various CNI providers, but most of the popular service mesh solutions also have some non-standardized form of Layers 3 and 4 filtering policy API. None are planning to align that with the work of the Network Policy Working Group.

At the same time, there is no equivalent group trying to standardize the APIs for Layer 7 filtering that are implemented differently by different service meshes (although their shared use of the open-source Envoy Proxy for filtering enforcement results in much uniformity). Organizationally, it could be hard to unify abstractions between the different software artifacts (CNIs vs. service meshes) because there is no project that is chartered to care about this and implement it. From an architectural perspective this makes sense, but unification might take a CNCF-leadership perspective rather than a project-centric perspective.

Where will this all end up?

Accepting that continuing functional overlap between CNIs and service meshes is inevitable means the goal for the Network SIG should ultimately be to define a common API for the relevant security, traffic management, and routing features regardless of whether they are implemented in something packaged as a CNI, a service mesh, or some other way of delivering a virtual network abstraction.

Kubernetes infrastructure experts will raise some good objections based on the architectural principles that differentiate a CNI from a service mesh and dictate a logical separation of features and standards. But from a UX perspective, there is a risk of being tone deaf and exposing system users to a system-developer-centric, bottom-up interface that exposes the “nerd knobs.”

If it is natural for users to think of both a CNI provider and a service mesh as implementing their network stack and features, it might improve the appeal of the platform to share more common abstractions and APIs. Network policy has a rich set of filtering primitives for selecting traffic and performing conditional actions. It could be extended and improved to handle all abstracted, Kubernetes-aware match/action rules for intra-cluster, inter-cluster, and extra-cluster networking.

Let us know what you think of the value of common abstractions across all traffic processing use cases. If you care about this topic, keep an eye on this work which is progressing quickly and will affect a lot of Kubernetes users.

Learn more about Illumio by contacting us today.

Kubernetes Cluster I/O Is a Big Mess – But Help Is on the Way