Most of america successful nan unreality autochthonal ecosystem unrecorded successful YAML. We unrecorded successful APIs, orchestration, controllers and instrumentality tooling. We unrecorded successful personification space. The Linux kernel sits beneath each of that arsenic an invisible unit we seldom deliberation about. Our day-to-day activity touches Kubernetes abstractions, not page tables aliases task structs.
From a information perspective, that is simply a problem. The kernel is nan existent boundary. Every container, each Extended Berkeley Packet Filter (eBPF) program, each Container Network Interface (CNI) plugin and each isolation characteristic yet reduces to information structures and codification paths wrong 1 monolithic Linux kernel. That kernel has go nan scaffolding for capacity and information crossed unreality native. When it fails, everything supra it fails.
Here is nan unsettling part. We person built an full ecosystem connected a constituent that was ne'er designed for cloud-scale multitenancy and is now 1 of nan fastest-changing and easiest to utilization parts of our infrastructure.
The Kernel Containers Inherit Was Never Designed for This
If you look astatine nan kernel’s position of a container, nan magic disappears quickly. A container is not an isolated capsule. It is simply a process represented by a task struct. It looks almost identical to immoderate different process isolated from for a fewer pointers to namespaces and cgroups. Those namespaces create nan quality of a abstracted environment, but nan kernel still runs everything done nan aforesaid schedulers, nan aforesaid strategy telephone array and nan aforesaid memory-unsafe C code.
This is nan basal disconnect. From personification space, it feels for illustration containers are isolated. From kernel space, a instrumentality is mostly a regular process dressed up pinch immoderate metadata.
We person made existent advancement successful instrumentality information complete nan past decade. Image scanning, signed proviso chains, distroless images and eBPF-based runtime visibility are each meaningful improvements. None of that changes nan reality that each workloads still stock 1 kernel that was ne'er designed to isolate hundreds of untrusted tenants. That mismatch sits astatine nan halfway of today’s risk.
It Became a CVE Authority and Looked Too Big To Fail
When nan Linux kernel became its ain CVE Numbering Authority, it gave nan manufacture a clearer model into really often kernel vulnerabilities are discovered. In early 2025, nan kernel unsocial produced complete 100 communal vulnerabilities and exposures (CVEs) successful a mates of weeks, astir 8 per day. These are not image-level issues. They use to each instrumentality moving connected that node.
Updating images is trivial. Updating a kernel is not. Kernel upgrades require node-level intervention, often reboots, and cautiously staged drains. The blast radius is large, truthful galore teams postpone nan work. Over time, nodes accumulate hundreds aliases thousands of kernel CVEs while continuing to tally captious workloads. Combine that pinch nan truth that nan kernel is nan azygous shared bound for nan full cluster, and nan consequence becomes obvious. If nan kernel goes, everything goes.
Exploiting nan Kernel Has Turned Into an N-Day Problem
There is still a belief that compromising nan kernel requires nation-state talent. In practice, we unrecorded successful an n-day world. Vendors hole nan bug, nan CVE is published and nan utilization remains effective for months aliases years because systems are slow to update.
One illustration we examined was a use-after-free bug successful netfilter that was fixed successful early 2024, yet was still being utilized successful ransomware campaigns successful precocious 2025. Public repositories provided nan utilization codification and moreover prebuilt binaries. Running it wrong an unprivileged instrumentality granted guidelines connected nan host. That azygous action erased each bound nan kernel enforced. At that point, nan only safe move is to rebuild nan node.
There was nary wizardry involved. No heavy kernel knowledge. Just a nationalist utilization and an outdated kernel. That is what n-day consequence looks like.
User Namespaces and Trading One Attack Surface for Another
User namespaces were enabled by default successful Kubernetes and wide celebrated arsenic an isolation win. Inside a personification namespace, a process appears arsenic guidelines but maps to an unprivileged personification outside. It sounds elegant successful theory.
The kernel does not cognize aliases attraction astir that theory. It simply sees a process pinch elevated privilege. That expanded privilege exposes kernel codification paths that nan process could not scope before. Many caller kernel exploits dangle connected personification namespaces for precisely this reason. They waste and acquisition 1 onslaught aboveground for another. User namespaces lick existent problems, but they besides widen nan introduction points disposable to dispute code. They are a trade-off, not a information cure.
What We Can Realistically Do About a Shared Kernel
Kernel hardening activity is existent and valuable. Projects for illustration nan Kernel Self-Protection Project amended representation information and trim communal exploitation techniques. These efforts run connected a very different threat model: 1 user, 1 machine, mostly trusted code. Kubernetes assumes nan opposite. Hardening helps, but it cannot destruct nan underlying architectural consequence of a shared kernel.
There are still applicable steps. Seccomp tin region wide categories of strategy calls. Blocking nan unshare strategy telephone prevents galore personification namespace-based exploits. Avoiding privileged containers and unnecessary big mounts reduces exposure. These are meaningful defenses earlier exploitation. None helps aft nan kernel is compromised.
The longer-term reply is structural. A shared kernel is for illustration a vessel pinch nary watertight compartments. One breach floods nan full vessel. Stronger isolation models incorporate blast radius by giving workloads their ain kernels aliases isolation domains. Public unreality hypervisors person proven that this works. Apple’s containerization model shows nan aforesaid rule applied to section improvement environments. The shape is clear.
Closing nan Last Trust Gap successful Container Security
The unreality autochthonal organization has made tremendous strides. Defaults are safer, proviso chains are cleaner and runtime visibility is amended than it has ever been. Kernel information has improved, too, acknowledgment to dedicated engineers doing highly difficult work.
The constituent is not that Linux is broken. The constituent is that our spot boundaries request to bespeak reality. A shared kernel is excessively powerful and excessively analyzable to presume information by default. If we harvester nan advancement we person already made pinch a clearer knowing of kernel consequence and stronger isolation approaches, we tin adjacent nan past awesome spread successful unreality autochthonal security. It is clip to bring nan kernel into nan conversation.
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don't miss an episode. Subscribe to our YouTube channel to watercourse each our podcasts, interviews, demos, and more.
Group Created pinch Sketch.
English (US) ·
Indonesian (ID) ·