๐งฑ Container Isolation Deep Dive โ seccomp, SELinux, AppArmor, Capabilities, gVisor, and Namespaces
Intro: Container isolation is not one switch. It is a layered reduction of what a compromised workload can ask the host kernel, runtime, and neighboring workloads to do.
What this page includes
- what the main isolation controls actually do
- Kubernetes and Docker examples
- top 10 mistakes when these controls are misconfigured
- where gVisor fits and where it does not
The isolation stack
| Control | Main job | What it does not solve alone |
|---|---|---|
| Namespaces | isolate process, network, mount, user, IPC views | kernel exploit resistance by itself |
| Capabilities | reduce ambient Linux privilege | syscall abuse not blocked by capability model |
| seccomp | reduce syscall surface | file-label policy or broad app behavior policy |
| AppArmor | path / capability / behavior restrictions | deep object labeling like SELinux |
| SELinux | label-based mandatory access control | general syscall filtering |
| gVisor | stronger sandbox boundary between app and host kernel | application bugs inside the sandbox |
1) Namespaces
Namespaces are the baseline isolation primitive. They make a process see its own PID, network, mount, user, and IPC world instead of the host's.
Why misconfiguration matters
If you share host namespaces casually, you collapse isolation.
High-risk patterns
hostNetwork: truehostPID: truehostIPC: true- disabling user-namespace isolation where you actually need it
2) Capabilities
Linux capabilities split root privilege into smaller units. The safe default is to drop everything and add back only what is needed.
Kubernetes example
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
Docker example
docker run --cap-drop ALL --read-only --security-opt no-new-privileges busybox:1.36
Common dangerous capabilities to review carefully
CAP_SYS_ADMINCAP_SYS_PTRACECAP_NET_ADMINCAP_SYS_MODULECAP_DAC_READ_SEARCH
3) seccomp
seccomp restricts which syscalls a process can make.
Good default
Use the runtime default first, then tighten only where you can validate behavior.
Kubernetes example
apiVersion: v1
kind: Pod
metadata:
name: hardened
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: cgr.dev/chainguard/nginx
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop: ["ALL"]
Docker example
docker run --security-opt seccomp=/path/to/profile.json nginx:stable
4) AppArmor
AppArmor confines programs using profiles that can restrict filesystem, capability, and behavioral access.
Good default
- prefer
RuntimeDefaultor a reviewed localhost profile; - treat
Unconfinedas an exception, not a convenience setting.
Kubernetes example
securityContext:
appArmorProfile:
type: RuntimeDefault
5) SELinux
SELinux uses labels and mandatory access control to constrain how processes and objects interact.
Why it matters
In SELinux-aware environments, it can stop workload-to-host or workload-to-volume access that DAC alone would allow.
Kubernetes example
securityContext:
seLinuxOptions:
level: "s0:c123,c456"
Review caveat
Poor label strategy can be nearly as bad as no strategy. Reused or overly broad labels weaken isolation.
6) gVisor
gVisor is not just another seccomp profile. It is an additional sandbox layer that moves Linux API handling into a user-space application kernel.
Good fit
- untrusted or semi-trusted code execution;
- multi-tenant compute pockets;
- higher-assurance workloads where reducing host-kernel attack surface matters.
Not a silver bullet
gVisor does not fix:
- application bugs inside the sandbox;
- side-channel issues at CPU / hardware level;
- insecure containerd / runtime / control-plane configuration before the sandbox is applied.
7) Practical hardening example
apiVersion: v1
kind: Pod
metadata:
name: hardened-app
spec:
automountServiceAccountToken: false
containers:
- name: app
image: ghcr.io/example/app@sha256:deadbeef
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
appArmorProfile:
type: RuntimeDefault
capabilities:
drop: ["ALL"]
Top 10 isolation mistakes
| # | Mistake | Why it is dangerous |
|---|---|---|
| 1 | privileged: true or equivalent |
effectively disables much of your isolation story |
| 2 | keeping CAP_SYS_ADMIN |
gives a huge privilege surface |
| 3 | running as root by default | increases impact of compromise |
| 4 | allowPrivilegeEscalation: true |
makes post-compromise escalation easier |
| 5 | Unconfined seccomp/AppArmor |
removes kernel and behavior guardrails |
| 6 | host namespace sharing | leaks host or neighbor visibility and control |
| 7 | broad hostPath mounts |
opens host tampering and data exposure paths |
| 8 | writable root filesystem everywhere | persistence and tampering become easier |
| 9 | default service-account token mounting | identity theft becomes easier after compromise |
| 10 | assuming gVisor or one control replaces the rest | breaks defense in depth |
Official references worth keeping close
- Kubernetes: security context, seccomp, AppArmor, Pod Security Standards
- Docker: seccomp profiles, user namespace remapping, capabilities, engine security
- gVisor: security model and architecture docs
Related pages
- Container / Kubernetes / Platform Security
- Kubernetes Security Baseline
- Docker Top 10 Misconfigurations
- AppArmor and seccomp for Docker
- Kubernetes Hardening
Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.