Kubernetes Fundamentals
Last updated
Last updated
Kubernetes is a highly popular open-source container orchestration platform that can be used to automate deployment, scaling and the management of containerized workloads. In this chapter, we will discover the basic architecture of a Kubernetes cluster and its components. Kubernetes is often used as a cluster, meaning that it is spanned across multiple servers that work on different tasks and to distribute the load of a system.
From a high-level perspective, Kubernetes clusters consist of two different server node types that make up a cluster:
Control Plane Node(s) – These are the brains of the operation.
Worker Nodes – The worker nodes are where applications run in your cluster.
containerd is a lightweight and performant implementation to run containers. Arguably the most popular container runtime right now. It is used by all major cloud providers for the Kubernetes As A Service products.
CRI-O was created by Red Hat and with a similar code base closely related to podman and buildah.
Docker was the standard for a long time, but never really made for container orchestration. The usage of Docker as the runtime for Kubernetes has been deprecated and removed in Kubernetes 1.24.
The idea of containerd and CRI-O was very simple: provide a runtime that only contains the absolutely essentials to run containers. Nevertheless, they have additional features, like the ability to integrate with container runtime sandboxing tools. These tools try to solve the security problem that comes with sharing the kernel between multiple containers. The most common tools at the moment are:
gVisor – Made by Google, provides an application kernel that sits between the containerized process and the host kernel.
Kata Containers – A secure runtime that provides a lightweight virtual machine, but behaves like a container.
The Kubernetes API is the most important component of a Kubernetes cluster. Without it, communication with the cluster is not possible, every user and every component of the cluster itself needs the api-server.
Before a request is processed by Kubernetes, it has to go through three stages:
Authentication
Authorization
Admission Control
In Kubernetes, instead of starting containers directly, you define Pods as the smallest compute unit and Kubernetes translates that into a running container.
Kubernetes distinguishes between four different networking problems that need to be solved:
Container-to-Container communications – This can be solved by the Pod concept as we'll learn later.
Pod-to-Pod communications – This can be solved with an overlay network.
Pod-to-Service communications – It is implemented by the kube-proxy and packet filter on the node.
External-to-Service communications – It is implemented by the kube-proxy and packet filter on the node.
In its most basic form, scheduling is a sub-category of container orchestration and describes the process of automatically choosing the right (worker) node to run a containerized workload on. In the past, scheduling was more of a manual task where a system administrator would choose the right server for an application by keeping track of the available servers, their capacity and other properties like where they are located.
In a Kubernetes cluster, the kube-scheduler is the component that makes the scheduling decision, but is not responsible for actually starting the workload.
A common misconception about Kubernetes is that it has some form of "artificial intelligence" analyzing the workload and moving Pods around based on resource consumption, type of workload and other factors. The truth is that a user has to give information about the application requirements, including requests for CPU and memory and properties of a node. The scheduler will use that information to filter all nodes that fit these requirements. If multiple nodes fit the requirements equally, Kubernetes will schedule the Pod on the node with the least amount of Pods.
One of the core concepts of Kubernetes is providing a lot of mostly abstract resources, also called objects, that you can use to describe how your workload should be handled. Some of them are used to handle problems of container orchestration, like scheduling and self-healing, others are there to solve some inherent problems of containers.
Kubernetes objects can be distinguished between workload-oriented objects that are used for handling container workloads and infrastructure-oriented objects, that for example handle configuration, networking and security. To access the API, users can use the official command line interface client called kubectl.
The most important object in Kubernetes is a Pod. A pod describes a unit of one or more containers that share an isolation layer of namespaces and cgroups. It is the smallest deployable unit in Kubernetes, which also means that Kubernetes is not interacting with containers directly. The pod concept was introduced to allow running a combination of multiple processes that are interdependent. All containers inside a pod share an IP address and can share via the filesystem.
Pods follow a defined lifecycle, starting in the Pending phase, moving through Running if at least one of its primary containers starts OK, and then through either the Succeeded or Failed phases depending on whether any container in the Pod terminated in failure.
Working just with Pods would not be flexible enough in a container orchestration platform. For example, if a Pod is lost because a node failed, it is gone forever. To make sure that a defined number of Pod copies runs all the time, we can use controller objects that manage the pod for us:
ReplicaSet
Deployment
StatefulSet
DaemonSet
Job
CronJob
Since a lot of Pods would require a lot of manual network configuration, we can use Service and Ingress objects to define and abstract networking. Services can be used to expose a set of pods as a network service:
ClusterIP
NodePort
LoadBalancer
ExternalName
Headless Services – You can use a headless Service to interface with other service discovery mechanisms, without being tied to Kubernetes' implementation. For headless Services, a cluster IP is not allocated, kube-proxy does not handle these Services, and there is no load balancing or proxying done by the platform for them.
As mentioned earlier, containers were not designed with persistent storage in mind, especially when that storage spans across multiple nodes. Kubernetes introduces a few solutions, but note that these solutions do not automatically remove all of the complexities of managing storage with containers.
Containers already had the concept of mounting volumes, but since we’re not working with containers directly, Kubernetes made volumes part of a Pod, just like containers are.
HostPath — Volumes allow sharing data between multiple pods in the cluster and also between multiple containers within the same Pod. This concept allows for great flexibility when you want to use a sidecar pattern. To make the user experience more uniform, Kubernetes is using the Container Storage Interface (CSI) which allows the storage vendor to write a plugin (storage driver) that can be used in Kubernetes.
To use this abstraction, we have two more objects that can be used:
PersistentVolumes (PV) – An abstract description for a slice of storage. The object configuration holds information like type of volume, volume size, access mode and unique identifiers and information how to mount it.
PersistentVolumeClaims (PVC) – A request for storage by a user. If the cluster has multiple persistent volumes, the user can create a PVC which will reserve a persistent volume according to the user's needs.
It is considered bad practice to incorporate the configuration directly into the container build. Any configuration change would require the entire image to be rebuilt and the entire container or pod to be redeployed. In Kubernetes, this problem is solved by decoupling the configuration from the Pods with a ConfigMap. ConfigMaps can be used to store whole configuration files or variables as key-value pairs. Right from the beginning Kubernetes also provided an object to store sensitive information like passwords, keys or other credentials. These objects are called Secrets. Secrets are very much related to ConfigMaps and basically their only difference is that secrets are base64 encoded.
To scale the workload in a Kubernetes cluster, we can use three different Autoscaling mechanisms.
Horizontal Pod Autoscaler (HPA)
Cluster Autoscaler
Vertical Pod Autoscaler
The scheduler is the control process which assigns Pods to Nodes. The scheduler determines which Nodes are valid placements for each Pod in the scheduling queue according to constraints and available resources. The scheduler then ranks each valid Node and binds the Pod to a suitable Node. Multiple different schedulers may be used within a cluster; kube-scheduler is the default implementation.
The default scheduler does a good job of scheduling the pods across the nodes in the cluster, however there are scenarios where you want to restrict the pod on particular nodes or prefer to run on particular nodes. There are several ways of doing this, the recommended way is to make use of Label Selectors to facilitate the selection:
nodeSelector field matching against node labels
Affinity and anti-affinity
nodeName field
Pod topology spread constraints
Node affinity is a property of Pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite -- they allow a node to repel a set of pods.
Tolerations are applied to pods. Tolerations allow the scheduler to schedule pods with matching taints. Tolerations allow scheduling, but don't guarantee scheduling: the scheduler also evaluates other parameters as part of its function.
Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taint.
Because Kubernetes is a network-intensive environment, it is crucial to secure the network using typical firewall techniques from outside the Kubernetes cluster and using pod-to-pod encryption, a NetworkPolicy and other measures from within the Kubernetes cluster.
Tools like AppArmor and SELinux should also be used to further protect the environment from malicious containers.
Security is more than just ‘settings and configuration’. It is an ongoing process of issue detection using intrusion detection tools and behavioral analytics.