☸️ Threat Modeling Process — Kubernetes Example
Intro: Kubernetes threat modeling goes wrong when teams stop at the Pod or node level. A realistic model has to slice the stack from ingress and identities all the way to CI/CD, registry trust, cluster control plane, and cloud privileges.
Why this page exists
- the OTUS Cloud DevSecOps materials use a layered attack-surface idea for Kubernetes, but the KB did not yet have a dedicated worked example;
- many teams know STRIDE in theory but struggle to apply it to a live cluster;
- the goal here is to show a repeatable, product-focused process rather than a one-off whiteboard ritual.
When to run this model
Use this page when at least one of the following is true:
- a new cluster or namespace boundary is being introduced;
- a service is moving from VM or serverless to Kubernetes;
- a new ingress, API gateway, service mesh, or admission policy is being introduced;
- a workload receives new secrets, broader service account access, or cloud IAM trust;
- CI/CD now deploys directly to the cluster;
- the product is multi-tenant or contains admin workflows.
Example system in scope
This example models a typical cloud-native product slice:
- public users access a web frontend;
- the frontend calls a REST API in Kubernetes;
- the API talks to Redis and PostgreSQL;
- background workers consume jobs from a queue;
- images are built in CI and pulled from a private registry;
- secrets come from a cluster secret store;
- the cluster runs in a cloud account and workloads can reach some cloud APIs.
Step 1 — Define the review objective
Keep the objective concrete.
Bad objective
threat model the cluster
Good objective
threat model the payment-api namespace before production release, with focus on tenant isolation, service account use, image trust, ingress exposure, and lateral movement after workload compromise.
Step 2 — List the critical assets
| Asset | Why it matters | Typical owner |
|---|---|---|
| customer data in PostgreSQL | confidentiality and integrity risk | application team + database/platform team |
| workload identities / service account tokens | can enable cluster or cloud escalation | platform team |
| container images and tags | supply chain trust and rollback risk | application team + platform team |
| CI deploy credentials / GitOps trust | release-path abuse and integrity risk | DevSecOps / platform team |
| ingress / API endpoint | initial access and abuse surface | app team + platform team |
| cluster audit logs and runtime telemetry | detection and forensics | security / platform team |
Step 3 — Draw the layers before drawing threats
A useful Kubernetes model should at minimum look through these layers:
- edge and ingress
- application/API service
- service-to-service calls
- service account and cluster identity
- secrets and configuration
- data stores and queues
- container image and runtime
- Kubernetes control plane and node boundary
- cloud IAM and metadata access
- CI/CD and registry trust
- logs, detections, and response path
Step 4 — Identify trust boundaries
This is the point most teams skip.
Trust boundaries in this example
- internet → ingress
- ingress namespace → payment-api namespace
- payment-api → internal services
- namespace workload identity → Kubernetes API
- workload identity → cloud APIs
- CI pipeline → registry
- registry / GitOps → cluster deploy path
- app container → node / runtime / kernel
- cluster → external logging / SIEM destination
If a line crosses one of these boundaries, ask what proves the action is authorized and how it is logged.
Step 5 — Walk attack paths by layer
Layer 1: ingress and public exposure
Ask:
- can unauthenticated endpoints leak metadata, debug info, or object identifiers?
- can the ingress route around central auth or rate limiting?
- can path-based routing expose admin or internal paths unexpectedly?
- is TLS terminated in the right place and are headers trusted correctly?
Typical findings
- admin path reachable from public ingress;
- X-Forwarded-* trust configured too broadly;
- missing request size or rate controls;
- DAST only covers anonymous routes, not authenticated flows.
Layer 2: application and object access
Ask:
- where is tenant isolation enforced: gateway, app code, or downstream service?
- can object IDs be enumerated?
- does authorization happen once at the edge and then get assumed everywhere else?
- do batch/export/reporting routes bypass standard access checks?
Typical findings
- route auth present but object-level auth weak;
- internal service trusts caller headers instead of verified identity;
- async worker can read broader data than the API itself.
Layer 3: service-to-service trust
Ask:
- is service identity explicit or implicit?
- are internal calls authenticated and authorized, or only “inside the cluster therefore trusted”?
- can one compromised workload call every internal service?
Typical findings
- flat east-west trust model;
- no namespace or network isolation;
- broad egress allows callbacks, exfiltration, or metadata access.
Layer 4: Kubernetes identity and service accounts
Ask:
- does this Pod actually need a mounted service account token?
- what RBAC verbs and resources are granted?
- can compromise of this Pod become secret read, exec, log read, or new workload creation?
Typical findings
- automounted service account token not needed;
- Role/ClusterRole includes
get/list/watch secretsorpods/execunnecessarily; - one namespace compromise becomes cluster reconnaissance.
Layer 5: secrets and configuration
Ask:
- where do secrets originate?
- are they static or short-lived?
- can secrets be read from environment, mounted files, logs, crash dumps, or debug endpoints?
- can developers or support staff read them during incident response?
Typical findings
- long-lived cloud credentials mounted into app containers;
- secrets stored in plain Kubernetes Secrets without adequate governance;
- debug mode or startup logs reveal secret material.
Layer 6: data stores and queues
Ask:
- can app compromise become full database compromise?
- are app DB credentials overprivileged?
- can a worker or queue consumer replay or mass-read other tenants’ data?
Typical findings
- app user owns schema and can alter tables;
- queue consumer has too-broad topic access;
- cache is reachable from too many workloads.
Layer 7: image and runtime
Ask:
- does the workload run as root?
- is the filesystem writable?
- are extra Linux capabilities present?
- is seccomp/AppArmor/SELinux in use?
- what happens if the container is compromised?
Typical findings
- image runs as UID 0;
- no seccomp or AppArmor profile;
- writable root filesystem used even when not required;
- admission policies do not block privileged workloads.
Layer 8: control plane and nodes
Ask:
- can node-level access or hostPath exposure bypass workload boundaries?
- are kubelet or node management interfaces exposed?
- does any workload get hostPID, hostNetwork, hostIPC, privileged, or hostPath mounts?
Typical findings
- operational debugging uses unsafe Pod specs;
- node compromise gives access to many namespaces;
- cluster audit logging is partial or disabled.
Layer 9: cloud trust and metadata access
Ask:
- can workloads reach instance metadata or equivalent token services?
- are workload-to-cloud permissions minimal?
- can the same compromise path hit KMS, object storage, queues, or secrets manager?
Typical findings
- network egress allows metadata service;
- workload role includes broad object storage or decrypt permissions;
- app identity and CI identity are not separated cleanly.
Layer 10: CI/CD and registry trust
Ask:
- who can push images or mutate tags?
- are deploys pinned by digest or floating tag?
- are there approval and evidence controls before production?
- can a runner compromise become image poisoning?
Typical findings
- mutable tags for production deploys;
- unsigned images admitted to cluster;
- pipeline and runtime trust share too much authority.
Layer 11: logging and detection
Ask:
- what logs would prove or disprove workload abuse?
- are Kubernetes audit logs enabled and centralized?
- do we alert on
exec, secret reads, RBAC changes, suspicious image changes, or unusual cloud API calls from workload identities?
Typical findings
- telemetry exists but no owner or alert path;
- runtime detections do not distinguish test from prod;
- incident responders cannot map workload identity to cloud actions.
Step 6 — Use a structured method, but do not become trapped by it
STRIDE mapping for this Kubernetes example
| STRIDE area | Example in Kubernetes context |
|---|---|
| Spoofing | forged service identity, trusted headers, stolen service account token |
| Tampering | image poisoning, mutable tag overwrite, manifest drift |
| Repudiation | weak or missing audit logs for deploys, exec, secret reads |
| Information disclosure | cross-tenant reads, secret leakage, broad logs, metadata access |
| Denial of service | no quotas/limits, queue abuse, expensive public endpoints |
| Elevation of privilege | root container, broad RBAC, cloud role escalation |
Do not force every threat into a method matrix if it makes the session worse. The method is there to improve coverage, not to replace judgment.
Step 7 — Convert the model into engineering outputs
A good threat model ends with owned actions.
Example output set for this Kubernetes system
| Output type | Example action |
|---|---|
| design change | enforce tenant checks in application service, not only ingress layer |
| platform guardrail | disable service-account automount unless explicitly required |
| policy gate | block privileged Pods, hostPath mounts, and non-default seccomp via admission |
| release gate | require digest-pinned deploys and signed image verification |
| detection requirement | alert on pods/exec, secret reads, RBAC changes, and unusual cloud API actions from workload roles |
| residual risk record | accepted short-term use of broad egress for migration, expires in 30 days |
Worked mini-example: payment-api namespace
Scenario
A new payment-api service is deployed behind ingress. It calls PostgreSQL and object storage. It runs in a namespace shared with several internal services. CI pushes :latest and Argo CD syncs automatically.
Fast threat-model findings
:latesttag allows unsafe rollback/overwrite ambiguity.- service account token is automounted though the app does not call Kubernetes API.
- namespace has no deny-by-default NetworkPolicy.
- object storage access is broader than needed.
- runtime detections do not cover container shell execution or unexpected outbound connections.
Resulting actions
- pin production deploys by image digest;
- set
automountServiceAccountToken: false; - add namespace baseline NetworkPolicy;
- narrow cloud IAM to bucket prefix and action set actually needed;
- add runtime detection for shell spawn, package manager execution, curl/wget, and abnormal egress.
Kubernetes-specific review checklist
Use this as the 10-minute closeout at the end of a modeling session.
- Does the workload need a service account token?
- Does the workload run as non-root?
- Is the root filesystem read-only where practical?
- Are seccomp/AppArmor/SELinux defaults enforced?
- Is east-west traffic actually segmented?
- Is metadata access blocked or intentionally controlled?
- Are cloud privileges scoped to the workload’s real need?
- Are production deploys pinned and signed?
- Are audit logs and runtime detections sufficient for incident response?
- Can a single namespace or runner compromise poison release or read other tenants’ data?
Common failure modes
- the team models only ingress and API endpoints, ignoring CI/CD and cloud IAM;
- the team talks about “Kubernetes risk” generically but never names the service account, namespace, role, or deploy path;
- the session stops at “use RBAC” without checking actual verbs/resources;
- no one turns the findings into guardrails, detections, or due dates;
- the review is never repeated after architecture drift.
Cross-links
- Threat Modeling Methods and Workflows
- Multi-Tenant and Microservice Threat Modeling
- Kubernetes Hardening
- Kubernetes API Access Hardening
- Runtime Investigation Playbook for Kubernetes and Containers
Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.