🛠️ Product Security Incident Response Playbooks

Intro: These playbooks are intentionally product-facing. They assume engineering and platform teams need clear first actions before a broader incident command structure fully forms around them.

What this page includes

high-value scenarios for product and platform teams

what to do in the first 15 minutes

evidence to collect before containment destroys context

how to feed postmortem lessons back into code, policy, and infrastructure

Operating principles

preserve evidence before you erase context;
isolate the smallest useful scope first;
revoke or rotate compromised identity quickly;
record exact artifacts, digests, and config state involved;
end every incident with at least one preventive, one detective, and one process improvement.

Scenario pack

Leaked Git or CI token

First 15 minutes

disable or revoke the token;
identify repo, runner, registry, and environment scope;
review pipeline, artifact, and image activity since suspected exposure.

Preserve

token creation and last-use audit trail;
related pipeline logs;
artifact digests and tag changes;
approval and deploy events.

Compromised runner or build agent

First 15 minutes

quarantine the runner;
stop scheduling new jobs to it;
identify accessible secrets, workspaces, artifacts, and cloud credentials.

Preserve

runner config;
mounted volumes and credentials;
recent job list and logs;
outbound network destinations;
registry or artifact-store access.

Exposed bucket or artifact store

First 15 minutes

remove public access or bad sharing;
determine whether only data was exposed or also code, manifests, or credentials;
preserve access logs before retention or rotation removes them.

Suspicious pod or workload behavior

First 15 minutes

decide whether the incident is runtime-only or identity-plus-cloud compromise;
isolate the workload or node according to platform guidance;
capture Pod spec, image digest, namespace, service account, and recent events.

Public API key or webhook secret exposure

First 15 minutes

rotate the secret;
review abuse windows;
identify replay, scraping, mass-callback, or unusual egress patterns.

Evidence classes that are worth collecting almost every time

audit logs;
workload and cloud identity used;
exact artifact and image digests;
deployed configuration or manifest state;
tenant, customer, or data scope impacted;
timeline of approvals, deploys, and runtime behavior.

Containment to eradication to codification

One of the most valuable habits in modern response is to turn operational fixes into durable engineering controls.

Stage	Example outcome
containment	quarantine runner, revoke token, isolate node
eradication	remove malicious image, delete persistence, rebuild workload
recovery	redeploy trusted artifacts, validate authz and telemetry
codify	add pipeline gate, policy rule, secret-handling change, or IaC control

Postmortem questions that improve the platform

which signal should have detected this sooner?
which approval or trust boundary failed?
which credential or artifact path was too broad?
what can be encoded in IaC, admission policy, runner design, or image promotion rules so this is harder next time?

Cross-links

Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.