A Guide to Navigating the Policy Overload in Today’s Distributed Systems

Technical Lead, Technology Exploration

December 4, 2024

23 min. read

I challenge you to attend KubeCon and go to a session about “policy.” When you get there, don’t be surprised if you’re left wondering, “Which kind of policy is this actually about?”

At the recent KubeCon in Salt Lake City, I found myself sprinting between sessions where “policy” was prominent in the titles. But for each speaker, the word meant something completely different.

As someone focused on label-based network policies, I often had to catch speakers beforehand to ask, “Is this policy session about network policies, admission controllers, or compliance?”

These exchanges reveal a growing issue in today’s cloud-native and distributed computing ecosystems. The term “policy” is used so broadly that it’s practically an abstraction in itself.

To untangle this, I’ll take a closer look at the eight different types of policies frequently discussed under this broad term and why they’re crucial to understanding the infrastructure, security, and automation in distributed systems.

1. Network policies

Network policies are important for controlling and managing how systems in a network communicate with each other, especially in environments like Kubernetes.

Most network policies use an allow-list approach. This means connections are blocked by default unless they are specifically allowed by the policy. These policies can use rules based on IP addresses or labels to decide which resources can communicate.

As microservices and container-based applications become more common, it’s even more important to carefully control how services communicate and keep them isolated when needed.

For example, Kubernetes network policies are highly customizable. They can set traffic rules for internal traffic (east-west) and external traffic (north-south). This flexibility makes them powerful tools for keeping systems secure, but it also makes them more complicated to build and manage.

2. Admission controller policies

Admission controllers are specialized policies in Kubernetes. They evaluate API requests to decide if they should be allowed or changed. They're essentially gatekeepers. They enforce certain standards or security practices across the cluster before an API request can move forward.

For example, admission controller policies can:

Automatically enforce resource limits

Add labels to deployments

Block unsafe configurations from being used

These kinds of policies can intercept and mutate requests. This makes them crucial to maintaining consistent security within clusters. But they only address policies in the Kubernetes API-call lifecycle.

3. OPA and Kyverno policies

Are OPA and Kyverno policies simply admission controllers, or are they something more?

Open Policy Agent (OPA) and Kyverno offer more than traditional admission controllers. While they often work as admission controllers in Kubernetes, they introduce a more flexible, comprehensive policy language. This allows organizations to define and apply complex rules across multiple systems.

OPA (Open Policy Agent) is a versatile policy engine that can be used across environments. It uses a language called Rego which can handle complex policy requirements. Besides Kubernetes, OPA can manage policies for CI/CD pipelines, microservices, and even cloud resources.

Kyverno is a policy engine made specifically for Kubernetes. It’s a simpler way to define policies in YAML. Many people prefer it for configuring Kubernetes. It works well with native Kubernetes objects which makes it easy to build and apply policies.

These tools can control what gets access to a system, but they can do much more across a range of apps and systems. They can manage:

Lifecycle management

Validation

Custom compliance checks

4. Resource quotas and limit policies

Resource management policies help control how much CPU, memory, and storage a Kubernetes cluster can use. These policies are important in shared environments because they prevent one app or user from using too many resources.

Quotas are usually set for each namespace. They limit the total amount of resources a namespace can use so no single namespace takes over too much.

Limits define the smallest and largest number of resources a container or pod can use. This makes sure no single workload uses too many resources and causes problems for the rest of the system.

With these policies, admins can balance resources which is especially important in environments with many users or apps that scale dynamically. While these policies help keep the system stable, they also make things more complicated as they interact other policy layers like automation and admission controls.

These policies help admins balance resources which is especially important in systems with lots of users or apps that scale dynamically. However, managing these policies can be challenging. They often overlap with other policies like automation and admission controls.

5. Pod Security Policies (PSPs)

Pod Security Policies (PSPs) in Kubernetes set security configurations at the pod level. This includes stopping containers from running as root or requiring certain Linux capabilities.

But PSPs are being phased out in Kubernetes. They’re getting replaced by newer options like Pod Security Standards (PSS) and external tools like OPA and Kyverno.

PSPs were designed to add granular security settings that prevent workloads from running with overly permissive configurations. While they are useful, managing PSPs along with other policies can get confusing. Newer tools offer more flexible ways to enforce security, often under the general term "security policies."

6. Service mesh policies

In microservices environments, service meshes like Istio or Linkerd add another policy layer which secures and monitors communication between services. These policies often:

Authenticate and authorize traffic: Service meshes use mTLS (mutual TLS) to encrypt communication between services. They also allow you to set policies for which services can communicate with each other, adding another layer of access control.

Manage traffic: Service mesh policies control routing, load balancing, and failover. This makes it easier to do things like A/B testing, canary releases, or route traffic to different service versions.

Unlike network policies, service mesh policies work at the application layer, affecting how services interact. These policies are crucial for managing service-to-service traffic. But they can sometimes be confusing because they overlap with network policies.

7. Compliance policies

Compliance policies can cover data management, access, and operational standards to ensure systems meet legal or internal compliance requirements, such as GDPR, HIPAA, or SOC 2. These policies can play a major role in many parts of a system, affecting security, logging, and data storage.

For example, a compliance policy might require that data only be stored in specific locations (data residency) or that audit logs are kept for a certain amount of time. In Kubernetes, tools like OPA and Kyverno are often used to enforce these policies by continuously monitoring and auditing systems in real time to make sure they meet the standards.

Compliance policies are especially important in industries with strict regulations, like healthcare and finance. Because they work across many parts of a system and often overlap with security policies, managing them can become complex. Despite this, they are crucial for ensuring systems stay secure, organized, and compliant.

8. Automation and lifecycle policies

Automation policies control when and how infrastructure resources are created, updated, or removed. These policies are an important part of Infrastructure as Code (IaC) where resource configurations are written as code and managed through CI/CD pipelines.

For example, automation policies can handle tasks like automatically scaling resources, cleaning up unused ones, or managing the steps in a deployment's lifecycle. They can also integrate with CI/CD pipelines to trigger builds, run tests, and manage deployments. This creates self-managing environments that can adjust to workload changes in real time.

Automation policies can simplify resource management and ensure best practices in cloud-native environments. But they interact closely with other policies, like those for resource management and admission control, which can add complexity.

Are you still following? The overlap of “policy” continues…

If you’re not yet overwhelmed, here’s the twist. Many organizations now have policies for policy.

These are called “meta-policies.” They act as guardrails, setting rules for who can make, manage, or apply other policies.

For example, a meta-policy might decide which teams can create specific network policies or who’s authorized to create admission control policies. These policies often rely on role-based access control (RBAC).

In large systems, RBAC policies for policies are essential. They make sure only specific administrators or teams can make changes to policies. By enforcing strict RBAC controls, these “policies for policy” ensure that other policies don’t overreach or interfere with critical infrastructure.

Final thoughts: A roadmap through policy overload

As cloud-native and distributed environments grow, the idea of "policy" will continue change. They'll become more complicated, specialized, and sometimes even contradictory.

To avoid policy overload, it’s important to use clear naming conventions, create documentation that defines each policy type, and make good use of policy tooling.

And the next time you're at a tech conference and you hear "policy," take a moment to ask, "Which one?!" It could save you from a lot of confusion—or even a cross-hall sprint!

Get in touch with us today to learn how Illumio can simplify your network security policies across cloud, endpoint, and data center environments.

A Guide to Navigating the Policy Overload in Today’s Distributed Systems

1. Network policies

2. Admission controller policies

3. OPA and Kyverno policies

4. Resource quotas and limit policies

5. Pod Security Policies (PSPs)

6. Service mesh policies

7. Compliance policies

8. Automation and lifecycle policies

Are you still following? The overlap of “policy” continues…

Final thoughts: A roadmap through policy overload

Related topics

Related articles

Why AI Has a Communication Problem

Learnings From MOVEit: How Organizations Can Build Resilience

Operationalizing Zero Trust – Steps 2 and 3: Determine Which Zero Trust Pillar to Focus On and Specify the Exact Control

The Evolution of Systems Design: From Write-Only Interfaces to Multi-Cloud Automation

Why More Flexible Cloud Service Models Are Less Expensive

Kubernetes Cluster I/O Is a Big Mess – But Help Is on the Way

Assume Breach.
Minimize Impact.
Increase Resilience.

A Guide to Navigating the Policy Overload in Today’s Distributed Systems

1. Network policies

2. Admission controller policies

3. OPA and Kyverno policies

4. Resource quotas and limit policies

5. Pod Security Policies (PSPs)

6. Service mesh policies

7. Compliance policies

8. Automation and lifecycle policies

Are you still following? The overlap of “policy” continues…

Final thoughts: A roadmap through policy overload

Related topics

Related articles

Why AI Has a Communication Problem

Learnings From MOVEit: How Organizations Can Build Resilience

Operationalizing Zero Trust – Steps 2 and 3: Determine Which Zero Trust Pillar to Focus On and Specify the Exact Control

The Evolution of Systems Design: From Write-Only Interfaces to Multi-Cloud Automation

Why More Flexible Cloud Service Models Are Less Expensive

Kubernetes Cluster I/O Is a Big Mess – But Help Is on the Way

Assume Breach.Minimize Impact.Increase Resilience.

Assume Breach.
Minimize Impact.
Increase Resilience.