Containers Anatomy 101: What is a Cluster?
From a networking perspective, containers extend the network "edge" – the boundary between network forwarding decisions and a packet reaching its final destination–deep into a host. The edge is no longer the network interface of a host but is instead several layers deep into logical constructs within a host. And the network topology is abstracted and reaches deep into these logical constructs within a host, in the form of overlay network tunneling, virtual interfaces, NAT boundaries, load balancers, and networking plugins. Network and security architects can no longer ignore OS internals when designing their architectures. Containers force these architectures to understand where a packet goes after it passes through a host's NIC.
With that said, an orchestration system is required to bring some form of order into containers environments. An orchestration system manages the details around organizing, scaling, and automating containers, and creates logical constructs around various components that are relevant to container behavior. They are also responsible for organizing logical boundaries associated with container runtimes and creating logical constructs that can be assigned an IP address. That said, such systems are external and cannot actually deploy and manage the lifecycle of specific containers runtime instances, which are still handled by Docker, for example.
There are many containers orchestration systems, but the two most commonly used today are Kubernetes and OpenShift. They both accomplish the same basic goals, with the primary difference being that one is a project and the other is a product: Kubernetes is a project born largely out of Google, and OpenShift is a product owned by Red Hat. Generally speaking, Kubernetes is most often seen in public cloud environments and OpenShift is most often seen in on-premise data centers, but there is a significant amount of overlap between the two. In short, Kubernetes underlies both approaches, with some slight difference in terminology between each.
A brief history of containers
Believe it or not, containers pre-date Kubernetes. Docker, for example, first released their containers platform in 2013, whereas Kubernetes did not release their public cloud-focused project until 2014. OpenShift launched before both, with a focus on hosts deployed in on-premise data centers.
Simply deploying container runtimes on a local host generally meets developers’ needs, since runtimes can communicate with each other via "localhost" and unique ports. Container runtimes aren’t assigned specific IP addresses. If you are focused on writing fast and efficient code and deploying your application across a collection of associated container runtimes, this approach works fine. But if you want that application to access external resources outside of the local host, or if you want external clients to access that application, you cannot ignore networking details. This is one of the reasons an orchestration system is needed.
Kubernetes was created around a set of building blocks and an API-driven workflow to organize the behavior of container runtimes. In this approach, Kubernetes creates a series of logical constructs within and across hosts associated with a specific containerized environment, and creates a whole new set of vocabulary to refer to these constructs. While Kubernetes applies these building blocks and API-driven workflows around a set of compute metrics associated with CPU allocation, memory requirements, and other metrics such as storage, authentication, and metering, most security and networking professionals are focused on one thing:
What boundaries does an IP packet pass through as it is en route to some logical construct that is assigned an IP address?
From a networking perspective, both Kubernetes and OpenShift create logical, relevant constructs in a hierarchical approach, with only a slight difference in vocabulary between each system. This is illustrated below.
The ABCs of a containers clusters
This diagram shows the basic logical construct of a Kubernetes environment. It doesn’t explain what each construct does, but only how they logically relate to each other.
Starting from the broadest construct down to the smallest, here are quick explanations:
- Cluster: A cluster is the collection of hosts associated with a specific containerized deployment.
- Nodes: Inside of a cluster, there are nodes. A node is the host on which containers reside. A host can be either a physical computer or a VM, and it can reside in either an on-premises data center or in a public cloud. Generally, there are two categories of nodes in a cluster: the “master nodes” and the “worker nodes”. To oversimplify things, a master node is the control plane that provides the central database of the cluster and the API server. The worker nodes are the machines running the real application pods.
- Pods: Inside of each node, both Kubernetes and OpenShift create pods. Each pod encompasses either one or more container runtimes and is managed by the orchestration system. Pods are assigned IP addresses by Kubernetes and OpenShift.
- Container: Inside of pods are where container runtimes reside. Containers within a given pod all share the same IP address as that pod, and they communicate with each other over Localhost, using unique ports.
- Namespace: A given application is deployed "horizontally" over multiple nodes in a cluster and defines a logical boundary to allocate resources and permissions. Pods (and therefore containers) and services, but also roles, secrets, and many other logical constructs belong to a namespace. OpenShift calls this a project, but it is the same concept. Generally speaking, a namespace maps to a specific application, which is deployed across all of the associated containers within it. A namespace has nothing to do with a network and security construct (different from a Linux IP namespace)
- Service: Since pods can be ephemeral—they can suddenly disappear and later be re-deployed dynamically—a service is a “front end”, which is deployed in front of a set of associated pods and functions like a load-balancer with a VIP that doesn't disappear if a pod disappears. A service is a non-ephemeral logical construct, with its own IP address. With only a few exceptions within Kubernetes and OpenShift, external connections point to a service’s IP address and are then forwarded to “backend” pods.
- Kubernetes API Server: This is where the API workflow is centralized, with Kubernetes managing the creation and lifecycle of all of these logical constructs.
Security challenges with containers
In order to create security segments along workload boundaries, it's necessary to understand these basic logical constructs created by Kubernetes. External network traffic moving in and out from the hosted application will generally not be pointing to the IP address of the underlying host, the node. Instead, network traffic will be pointing to either a service or a pod within that host. Therefore, a workload's associated services and pods need to be sufficiently understood in order to create an effective segmentation security architecture.
Interested in more? Check out our paper on the challenges of network-based approaches to container segmentation and how to overcome them using host-based segmentation.