🛎️ Runtime Detection Stack — Falco, Tetragon, and Cloud Signals

Intro: Runtime detection works best when it is treated as one layer in a detection stack, not as the whole stack. This page explains how to combine workload runtime sensors with cloud control-plane evidence and investigation-friendly routing.

Why teams struggle here

Common failure modes:

they deploy a runtime tool and expect it to replace audit logs;
they enable many default rules but do not tune ownership or namespaces;
they route everything to chat and nothing to durable storage;
they do not connect runtime alerts to cloud and Kubernetes control-plane context.

A practical layered model

Layer	Primary source	Best at	Weak at
Cloud control plane	CloudTrail, Activity Log, provider audit trails	identity and infrastructure changes	workload syscall detail
Kubernetes control plane	audit logs, admission logs	object changes, RBAC activity, workload creation	process execution detail
Runtime workload	Falco, Tetragon, eBPF-based sensors	suspicious execution, file, net, capability behavior	“who changed IAM?” questions
Durable analytics	SIEM, data lake, search platform	correlation, history, case work	real-time local enforcement

Falco versus Tetragon quick view

Tool	Strongest use	Typical operator fit
Falco	rule-driven detections with rich ecosystem and simple output routing	teams that want broad community examples and fast time-to-value
Tetragon	deep eBPF-based runtime and identity-aware policy or tracing patterns	teams already invested in Cilium or eBPF-heavy Kubernetes networking

Use either. Use both only when you can clearly explain ownership, overlap, and alert routing.

Practical deployment pattern

Baseline

provider audit logs enabled and retained;
Kubernetes audit logs enabled;
one runtime sensor on production clusters;
all runtime alerts shipped to durable storage;
alert routes for only selected high-confidence detections.

Good next step

namespace-aware or team-aware rule ownership;
custom rules for your estate, not only upstream defaults;
runtime investigation playbook attached to the alert;
response hooks for a small number of high-confidence cases.

Practical snippet — Falco via Helm

helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
helm upgrade -i falco falcosecurity/falco \
  --namespace falco --create-namespace \
  --set tty=true

Practical snippet — custom Falco rule with namespace filter

- list: product_namespaces
  items: [billing, checkout, identity]

- rule: Package manager in product namespace
  desc: Detect apt, yum, apk, or dnf execution in product workloads
  condition: >
    spawned_process and container and
    k8s.ns.name in (product_namespaces) and
    proc.name in (apt, apt-get, apk, yum, dnf, rpm)
  output: >
    Package manager executed in product workload
    (ns=%k8s.ns.name pod=%k8s.pod.name image=%container.image.repository cmd=%proc.cmdline)
  priority: WARNING
  tags: [container, drift, package-manager]

Practical snippet — Tetragon-style policy shape

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: exec-shell
spec:
  kprobes:
    - call: "__x64_sys_execve"
      selectors:
        - matchArgs:
            - index: 0
              operator: Prefix
              values:
                - "/bin/bash"
                - "/bin/sh"

Use this as a mental model for fine-grained execution tracing. Adjust to your cluster and Tetragon version.

Practical snippet — route Falco to a webhook

falcosidekick:
  enabled: true
  config:
    webhook:
      address: http://event-router.security.svc.cluster.local/falco

Practical snippet — correlate runtime and cloud

Runtime alert: unexpected shell in container
Questions to ask next:
- was the pod recently redeployed?
- did a privileged role or human actor change the deployment?
- did an image digest change unexpectedly?
- did a secret or service account binding change near the same time?

Use CloudTrail, Kubernetes audit logs, deployment history, and image registry evidence to answer those questions.

What to alert on first

Start with high-confidence patterns:

unexpected shell in production container;
package manager execution in product namespace;
write under sensitive config paths;
outbound connection from a binary that should not make network calls;
exec into a workload from an unusual identity or admin path;
new privileged workload or excessive capabilities.

What not to do first

Do not start with:

every shell event in every namespace;
every file write everywhere;
rules with no team ownership;
auto-remediation for low-confidence behaviors.

Investigation-ready data to keep

For each alert, preserve:

rule name and condition matched;
pod, namespace, container name, image digest;
command line, parent process, user, capability context;
node, cluster, and time window;
link to correlated audit-log query or dashboard.

Legacy notes

Older runtime-detection guidance often used:

host IDS/IPS language;
“container runtime defense” as a broad product category;
more manual driver handling and manual config editing.

Those ideas are still useful, but the better current framing is:

runtime sensor + cloud logs + K8s audit + investigation workflow;
policy and routing owned by real teams;
install and update through package managers, Helm, or operator-friendly flows.

---Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.