Published on

How Kubernetes Works , A Developer’s Guide for Modern Engineering Teams

Authors
  • avatar
    Name
    Christopher Clemmons
    Twitter

How Kubernetes Works

Modern software succeeds or fails on its ability to scale and recover quickly. When traffic spikes, when nodes fail, when teams ship multiple times per day, manual operations turn fragile. Containers solved packaging and portability. They did not solve orchestration. Kubernetes exists to fill that gap.

Think of Kubernetes as an air traffic controller for your applications. It tells each workload when and where to run, watches health in real time, and keeps everything moving safely. You declare the desired state. Kubernetes constantly reconciles the actual state toward it.

Why Kubernetes Exists

Before Kubernetes, teams babysat fleets of VMs and scripts. Scaling meant logging into consoles, cloning instances, and hoping configuration drift did not bite. Recovery after a crash meant paging someone and waiting. Release engineering was inconsistent across services, so rollbacks were slow and risky.

Kubernetes replaces ad hoc practice with a consistent, declarative control system. You define deployments, services, and policies as code. The platform schedules workloads, balances traffic, restarts failed containers, and rolls versions forward or back with guardrails. The result is repeatability. The outcome is availability and speed.

What Kubernetes Actually Does

Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. It focuses on four jobs that every production platform must do well:

  1. Placement. A scheduler assigns Pods to nodes based on resources and rules.
  2. Networking. Services provide stable virtual endpoints so clients can find workloads even as Pods come and go.
  3. Health and rollout. Controllers watch liveness and readiness, perform rolling updates, and roll back if something goes wrong.
  4. Configuration and policy. ConfigMaps, Secrets, resource limits, security rules, and admission policies define how software runs and what it can access.

Everything runs through one API. That single surface makes auditing and automation straightforward.

Under the Hood: The Control Plane

The control plane is the brain of the cluster.

  • API server. The front door. Every command, controller, and operator talks to the API server.
  • etcd. The source of truth. It stores the cluster state in a strongly consistent key value store.
  • Scheduler. The traffic cop for compute. It places Pods onto nodes that satisfy CPU, memory, topology, and policy requirements.
  • Controller manager. The reconciliation engine. It notices differences between desired and actual state, then takes small, safe steps until they match.

This design favors reliability. State is declared once, stored durably, and advanced by idempotent control loops. If a node disappears or a Pod crashes, reconciliation closes the gap without a human in the loop.

Worker Nodes and the Pod Lifecycle

Worker nodes host your workloads.

  • kubelet runs on each node and ensures the containers that should be running are running.
  • A container runtime like containerd executes OCI images.
  • The data plane, implemented by kube-proxy or eBPF, provides service routing.

A typical lifecycle looks like this: you apply a Deployment, the scheduler selects a node, kubelet pulls the image and starts containers, readiness probes signal when traffic can flow, and Services begin sending requests to only those Pods that are ready. If a container exits unexpectedly, kubelet restarts it. If the node fails, the scheduler moves the Pods elsewhere.

Networking in Plain Language

Every Pod receives an IP address. Pods talk to Pods directly inside the cluster. Clients talk to a Service, which is a stable virtual endpoint that maps to a changing set of Pods. You can expose Services internally, publish them externally through load balancers, and route HTTP traffic with an Ingress or the newer Gateway API. Production clusters add NetworkPolicies so only approved flows are allowed. That is a practical zero trust posture inside the cluster.

State and Storage

Containers are ephemeral. Data is not. Kubernetes separates compute from storage using PersistentVolumes and PersistentVolumeClaims. You attach claims to Pods so data outlives container restarts. StorageClasses describe performance and durability. StatefulSets add stable identity and ordered updates for stateful apps such as databases. Backup and disaster recovery rely on snapshots and cluster-wide tools that integrate with your storage provider.

Shipping Software: Deployments and Releases

Kubernetes standardizes rollout mechanics. A Deployment performs rolling updates, keeps a history, and lets you roll back fast. Blue and green releases keep two versions hot and switch traffic when checks pass. Canary releases shift small percentages first, then ramp up as metrics hold steady. These patterns become configuration, not one-off scripts.

Tooling fits neatly around this model. Helm packages deployable charts. Kustomize overlays environment differences without forking manifests. GitOps controllers such as Argo CD or Flux watch a Git repository and sync clusters automatically. The platform becomes fully declarative and auditable.

Scaling and Resilience

Autoscalers adjust capacity in response to real signals. The Horizontal Pod Autoscaler scales replicas. The Vertical Pod Autoscaler rightsizes CPU and memory. The Cluster Autoscaler adds or removes nodes so you pay for what you need. PodDisruptionBudgets and topology spread constraints keep availability high during maintenance and failures. Multi zone and multi region designs protect against larger outages.

Security and Compliance that Holds Up in Audit

Security starts with isolation and least privilege.

  • RBAC limits who can do what. Every workload uses its own ServiceAccount and short-lived credentials.
  • Pod Security Standards apply restricted defaults. Containers run as non-root, with minimal capabilities and a read only root file system where possible.
  • NetworkPolicies default to deny, then allow only documented flows.
  • Secrets integrate with a cloud KMS and external secret operators so sensitive values never sit unencrypted in the cluster.
  • Admission control with Gatekeeper or Kyverno prevents noncompliant manifests at the door.
  • Audit logs from the API server feed an immutable store so investigations have complete evidence.

The same controls that keep attackers out also reduce change risk. Fewer production incidents. Faster root cause. Cleaner audit trails.

Observability and SRE Practice

A healthy platform is one you can see. Metrics flow into Prometheus and dashboards in Grafana. Distributed traces via OpenTelemetry show where time is spent across services. Logs aggregate centrally with Fluent Bit or Fluentd. Service level objectives and error budgets connect alerts to user impact. Runbooks and rehearsed drills keep response times short when the unexpected happens.

Cost and Capacity

Right sizing matters. Set requests and limits for every workload so the scheduler packs efficiently and noisy neighbors do not starve critical services. Use autoscaling to match demand. Prefer spot or preemptible capacity where interruption is acceptable. Label resources so cost allocation is visible per team and product. Review capacity on a cadence and capture the savings.

Managed or Self Managed

Managed control planes like EKS, AKS, and GKE offer hardened masters, upgrades, and support. Self managing buys maximum control at the price of more operational work. Most enterprises do best with a managed control plane plus infrastructure as code and policy as code on top.

Using cdk8s in a Regulated Environment

Infrastructure as code is more than convenience. It is how you make compliance repeatable.

I used cdk8s to help a UnitedHealth Group team define, manage, and deploy Kubernetes resources for HIPAA scoped workloads that include PHI. We built a library of reusable constructs that encoded secure defaults: restricted pod security contexts, mandatory resource limits, readiness and liveness probes, and baseline NetworkPolicies. Namespaces came with quotas, limit ranges, and required ownership labels. Admission policies blocked noncompliant manifests. Secrets were pulled from a KMS backed store. The entire cluster ran with etcd encryption at rest and TLS everywhere in transit. We delivered through GitOps, so every change was reviewed, merged, and reconciled by a controller. The payoff was real: consistent environments, faster onboarding for new services, and clear evidence for audits through Git history and API audit logs.

cdk8s let us express Kubernetes objects in TypeScript, test them, and synthesize plain YAML. That combination of static typing, reuse, and policy at the edge improved both speed and safety.

What Managers Should Expect

Kubernetes is not a silver bullet. It is a foundation that turns reliability and security into first class features. With a managed control plane, GitOps, policy as code, and a small set of paved paths, teams ship faster with fewer surprises. The platform team spends more time improving guardrails and less time firefighting. Product teams spend more time delivering value and less time wrestling with infrastructure.

Conclusion

Kubernetes exists because orchestration at scale is a control problem, not a scripting problem. By declaring intent and letting a reconciler do the steady work, you get a platform that is consistent under pressure. Add cdk8s for reusable, testable definitions. Add GitOps and admission policy for trust and traceability. The result is a system that feels less like magic and more like disciplined engineering that converts directly into uptime, velocity, and compliance that will stand up to any review.