๐ ๏ธ Product Security Incident Response Playbooks
Intro: These playbooks are intentionally product-facing. They assume engineering and platform teams need clear first actions before a broader incident command structure fully forms around them.
What this page includes
- high-value scenarios for product and platform teams
- what to do in the first 15 minutes
- evidence to collect before containment destroys context
- how to feed postmortem lessons back into code, policy, and infrastructure
Operating principles
- preserve evidence before you erase context;
- isolate the smallest useful scope first;
- revoke or rotate compromised identity quickly;
- record exact artifacts, digests, and config state involved;
- end every incident with at least one preventive, one detective, and one process improvement.
Scenario pack
Leaked Git or CI token
First 15 minutes
- disable or revoke the token;
- identify repo, runner, registry, and environment scope;
- review pipeline, artifact, and image activity since suspected exposure.
Preserve
- token creation and last-use audit trail;
- related pipeline logs;
- artifact digests and tag changes;
- approval and deploy events.
Compromised runner or build agent
First 15 minutes
- quarantine the runner;
- stop scheduling new jobs to it;
- identify accessible secrets, workspaces, artifacts, and cloud credentials.
Preserve
- runner config;
- mounted volumes and credentials;
- recent job list and logs;
- outbound network destinations;
- registry or artifact-store access.
Exposed bucket or artifact store
First 15 minutes
- remove public access or bad sharing;
- determine whether only data was exposed or also code, manifests, or credentials;
- preserve access logs before retention or rotation removes them.
Suspicious pod or workload behavior
First 15 minutes
- decide whether the incident is runtime-only or identity-plus-cloud compromise;
- isolate the workload or node according to platform guidance;
- capture Pod spec, image digest, namespace, service account, and recent events.
Public API key or webhook secret exposure
First 15 minutes
- rotate the secret;
- review abuse windows;
- identify replay, scraping, mass-callback, or unusual egress patterns.
Evidence classes that are worth collecting almost every time
- audit logs;
- workload and cloud identity used;
- exact artifact and image digests;
- deployed configuration or manifest state;
- tenant, customer, or data scope impacted;
- timeline of approvals, deploys, and runtime behavior.
Containment to eradication to codification
One of the most valuable habits in modern response is to turn operational fixes into durable engineering controls.
| Stage | Example outcome |
|---|---|
| containment | quarantine runner, revoke token, isolate node |
| eradication | remove malicious image, delete persistence, rebuild workload |
| recovery | redeploy trusted artifacts, validate authz and telemetry |
| codify | add pipeline gate, policy rule, secret-handling change, or IaC control |
Postmortem questions that improve the platform
- which signal should have detected this sooner?
- which approval or trust boundary failed?
- which credential or artifact path was too broad?
- what can be encoded in IaC, admission policy, runner design, or image promotion rules so this is harder next time?
Cross-links
- Runtime Investigation Playbook for Kubernetes and Containers
- Logging and Telemetry Strategy
- Containment and Eradication Automation Lab
- Policy Exception Governance Pack
Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.