PS Product SecurityKnowledge Base

๐Ÿ›ก๏ธ Containment and Eradication Automation Lab โ€” SOAR, Remediations, and Postmortem-to-IaC Feedback

Intro: Detection is only half of the job. This lab teaches the next step: how to automate safe containment, preserve evidence, and then push the durable fix back into infrastructure-as-code, policies, or platform baselines.

What this page includes

  • how to structure a containment-and-eradication lab;
  • safe automation patterns with SOAR and native cloud tools;
  • examples using AWS Systems Manager and Cortex XSOAR style playbooks;
  • how to convert a one-time incident into a codified control improvement.

Learning goal

A good automation lab teaches four skills:

  1. know when to automate;
  2. know what must stay manual;
  3. preserve evidence before destroying context;
  4. feed the durable fix back into code and policy.

Safe automation rules

Never automate destructive response before you answer:

  • what evidence do we lose?
  • can the action break production?
  • who approves the containment?
  • how do we restore safely?

Good starter scenarios

Scenario Why it is good for a lab
suspicious EC2 or VM egress teaches reversible containment
compromised IAM principal or service account teaches identity-focused isolation
public security group or NSG teaches quick risk reduction and codified fix
container or pod compromise teaches runtime evidence + kill/replace discipline
leaked secret or token teaches rotation, blast-radius review, and pipeline follow-up

AWS-native starter pattern

AWS Systems Manager already ships useful containment runbooks.

Example: contain an EC2 instance

aws ssm start-automation-execution \
  --document-name AWSSupport-ContainEC2Instance \
  --parameters InstanceId=i-0123456789abcdef0

Example: quarantine an EC2 instance

aws ssm start-automation-execution \
  --document-name AWS-QuarantineEC2Instance \
  --parameters InstanceId=i-0123456789abcdef0

Example: contain an IAM principal

aws ssm start-automation-execution \
  --document-name AWSSupport-ContainIAMPrincipal \
  --parameters IAMResourceArn=arn:aws:iam::123456789012:user/suspicious-user

Example custom SSM automation skeleton

schemaVersion: '0.3'
description: Isolate instance and snapshot evidence metadata
assumeRole: '{{ AutomationAssumeRole }}'
parameters:
  AutomationAssumeRole:
    type: String
  InstanceId:
    type: String
mainSteps:
  - name: captureInstanceMetadata
    action: aws:executeAwsApi
    inputs:
      Service: ec2
      Api: DescribeInstances
      InstanceIds:
        - '{{ InstanceId }}'
  - name: quarantineInstance
    action: aws:executeAwsApi
    inputs:
      Service: ec2
      Api: ModifyInstanceAttribute
      InstanceId: '{{ InstanceId }}'
      Groups:
        - sg-containment

Cortex XSOAR style lab pattern

A SOAR playbook is useful when your response needs:

  • ticketing;
  • analyst approval steps;
  • enrichment;
  • branching logic;
  • human-in-the-loop escalation.

Minimal playbook design idea

  1. ingest incident;
  2. enrich asset, identity, and tenant context;
  3. ask: manual approval required?
  4. perform reversible containment;
  5. collect evidence references;
  6. open remediation ticket;
  7. trigger postmortem checklist.

Example pseudo-playbook logic

If incident.type == suspicious-ec2:
  collect cloudtrail + vpcflow + guardduty context
  ask analyst for approval
  run AWSSupport-ContainEC2Instance
  create jira ticket for source-of-truth fix
  notify platform owner

Postmortem-to-IaC feedback loop

This is the most important part of the lab.

After containment, force the learner to answer:

  • what allowed the incident path?
  • what guardrail should have stopped it earlier?
  • what Terraform / Helm / policy / CI rule must change?
  • what new detection should exist next time?

Example: translate incident into code change

Incident: public admin port on a security group.

Temporary action: close the rule.

Durable action: update Terraform module defaults.

variable "allowed_admin_cidrs" {
  type    = list(string)
  default = []
}

resource "aws_security_group_rule" "admin_ingress" {
  count             = length(var.allowed_admin_cidrs) > 0 ? 1 : 0
  type              = "ingress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  cidr_blocks       = var.allowed_admin_cidrs
  security_group_id = aws_security_group.app.id
}

Example validation after the fix

checkov -d infra/
terraform plan
prowler aws --check aws_ec2_securitygroup_allow_ingress_from_internet_to_tcp_ports_22_3389

Web UI how-to ideas for the lab

AWS console path

  1. Systems Manager โ†’ Automation.
  2. Search for containment or quarantine runbook.
  3. Review required parameters and assume role.
  4. Execute against the target resource.
  5. Save execution ID into the incident record.

XSOAR path

  1. open incident;
  2. run enrichment tasks;
  3. request manual approval if production risk exists;
  4. execute containment task;
  5. attach evidence and open engineering remediation ticket;
  6. link the postmortem record.

Common mistakes

  • automating containment without an approval boundary for production systems;
  • deleting or rebuilding assets before preserving evidence;
  • fixing only the live resource and not the template or module;
  • stopping at detection and never building the response automation.

---Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.