🧱 Container Isolation Deep Dive — seccomp, SELinux, AppArmor, Capabilities, gVisor, and Namespaces

Intro: Container isolation is not one switch. It is a layered reduction of what a compromised workload can ask the host kernel, runtime, and neighboring workloads to do.

What this page includes

what the main isolation controls actually do

Kubernetes and Docker examples

top 10 mistakes when these controls are misconfigured

where gVisor fits and where it does not

The isolation stack

Control	Main job	What it does not solve alone
Namespaces	isolate process, network, mount, user, IPC views	kernel exploit resistance by itself
Capabilities	reduce ambient Linux privilege	syscall abuse not blocked by capability model
seccomp	reduce syscall surface	file-label policy or broad app behavior policy
AppArmor	path / capability / behavior restrictions	deep object labeling like SELinux
SELinux	label-based mandatory access control	general syscall filtering
gVisor	stronger sandbox boundary between app and host kernel	application bugs inside the sandbox

1) Namespaces

Namespaces are the baseline isolation primitive. They make a process see its own PID, network, mount, user, and IPC world instead of the host's.

Why misconfiguration matters

If you share host namespaces casually, you collapse isolation.

High-risk patterns

hostNetwork: true
hostPID: true
hostIPC: true
disabling user-namespace isolation where you actually need it

2) Capabilities

Linux capabilities split root privilege into smaller units. The safe default is to drop everything and add back only what is needed.

Kubernetes example

securityContext:
  runAsNonRoot: true
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]

Docker example

docker run --cap-drop ALL --read-only --security-opt no-new-privileges busybox:1.36

Common dangerous capabilities to review carefully

CAP_SYS_ADMIN
CAP_SYS_PTRACE
CAP_NET_ADMIN
CAP_SYS_MODULE
CAP_DAC_READ_SEARCH

3) seccomp

seccomp restricts which syscalls a process can make.

Good default

Use the runtime default first, then tighten only where you can validate behavior.

Kubernetes example

apiVersion: v1
kind: Pod
metadata:
  name: hardened
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: app
      image: cgr.dev/chainguard/nginx
      securityContext:
        allowPrivilegeEscalation: false
        runAsNonRoot: true
        capabilities:
          drop: ["ALL"]

Docker example

docker run --security-opt seccomp=/path/to/profile.json nginx:stable

4) AppArmor

AppArmor confines programs using profiles that can restrict filesystem, capability, and behavioral access.

Good default

prefer RuntimeDefault or a reviewed localhost profile;
treat Unconfined as an exception, not a convenience setting.

Kubernetes example

securityContext:
  appArmorProfile:
    type: RuntimeDefault

5) SELinux

SELinux uses labels and mandatory access control to constrain how processes and objects interact.

Why it matters

In SELinux-aware environments, it can stop workload-to-host or workload-to-volume access that DAC alone would allow.

Kubernetes example

securityContext:
  seLinuxOptions:
    level: "s0:c123,c456"

Review caveat

Poor label strategy can be nearly as bad as no strategy. Reused or overly broad labels weaken isolation.

6) gVisor

gVisor is not just another seccomp profile. It is an additional sandbox layer that moves Linux API handling into a user-space application kernel.

Good fit

untrusted or semi-trusted code execution;
multi-tenant compute pockets;
higher-assurance workloads where reducing host-kernel attack surface matters.

Not a silver bullet

gVisor does not fix:

application bugs inside the sandbox;
side-channel issues at CPU / hardware level;
insecure containerd / runtime / control-plane configuration before the sandbox is applied.

7) Practical hardening example

apiVersion: v1
kind: Pod
metadata:
  name: hardened-app
spec:
  automountServiceAccountToken: false
  containers:
    - name: app
      image: ghcr.io/example/app@sha256:deadbeef
      securityContext:
        runAsNonRoot: true
        readOnlyRootFilesystem: true
        allowPrivilegeEscalation: false
        seccompProfile:
          type: RuntimeDefault
        appArmorProfile:
          type: RuntimeDefault
        capabilities:
          drop: ["ALL"]

Top 10 isolation mistakes

#	Mistake	Why it is dangerous
1	`privileged: true` or equivalent	effectively disables much of your isolation story
2	keeping `CAP_SYS_ADMIN`	gives a huge privilege surface
3	running as root by default	increases impact of compromise
4	`allowPrivilegeEscalation: true`	makes post-compromise escalation easier
5	`Unconfined` seccomp/AppArmor	removes kernel and behavior guardrails
6	host namespace sharing	leaks host or neighbor visibility and control
7	broad `hostPath` mounts	opens host tampering and data exposure paths
8	writable root filesystem everywhere	persistence and tampering become easier
9	default service-account token mounting	identity theft becomes easier after compromise
10	assuming gVisor or one control replaces the rest	breaks defense in depth

Official references worth keeping close

Kubernetes: security context, seccomp, AppArmor, Pod Security Standards
Docker: seccomp profiles, user namespace remapping, capabilities, engine security
gVisor: security model and architecture docs

Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.

🧱 Container Isolation Deep Dive — seccomp, SELinux, AppArmor, Capabilities, gVisor, and Namespaces

The isolation stack

1) Namespaces

Why misconfiguration matters

2) Capabilities

Kubernetes example

Docker example

Common dangerous capabilities to review carefully

3) seccomp

Good default

Kubernetes example

Docker example

4) AppArmor

Good default

Kubernetes example

5) SELinux

Why it matters

Kubernetes example

Review caveat

6) gVisor

Good fit

Not a silver bullet

7) Practical hardening example

Top 10 isolation mistakes

Official references worth keeping close

Related pages