๐๏ธ Runtime Detection Stack โ Falco, Tetragon, and Cloud Signals
Intro: Runtime detection works best when it is treated as one layer in a detection stack, not as the whole stack. This page explains how to combine workload runtime sensors with cloud control-plane evidence and investigation-friendly routing.
Why teams struggle here
Common failure modes:
- they deploy a runtime tool and expect it to replace audit logs;
- they enable many default rules but do not tune ownership or namespaces;
- they route everything to chat and nothing to durable storage;
- they do not connect runtime alerts to cloud and Kubernetes control-plane context.
A practical layered model
| Layer | Primary source | Best at | Weak at |
|---|---|---|---|
| Cloud control plane | CloudTrail, Activity Log, provider audit trails | identity and infrastructure changes | workload syscall detail |
| Kubernetes control plane | audit logs, admission logs | object changes, RBAC activity, workload creation | process execution detail |
| Runtime workload | Falco, Tetragon, eBPF-based sensors | suspicious execution, file, net, capability behavior | โwho changed IAM?โ questions |
| Durable analytics | SIEM, data lake, search platform | correlation, history, case work | real-time local enforcement |
Falco versus Tetragon quick view
| Tool | Strongest use | Typical operator fit |
|---|---|---|
| Falco | rule-driven detections with rich ecosystem and simple output routing | teams that want broad community examples and fast time-to-value |
| Tetragon | deep eBPF-based runtime and identity-aware policy or tracing patterns | teams already invested in Cilium or eBPF-heavy Kubernetes networking |
Use either. Use both only when you can clearly explain ownership, overlap, and alert routing.
Practical deployment pattern
Baseline
- provider audit logs enabled and retained;
- Kubernetes audit logs enabled;
- one runtime sensor on production clusters;
- all runtime alerts shipped to durable storage;
- alert routes for only selected high-confidence detections.
Good next step
- namespace-aware or team-aware rule ownership;
- custom rules for your estate, not only upstream defaults;
- runtime investigation playbook attached to the alert;
- response hooks for a small number of high-confidence cases.
Practical snippet โ Falco via Helm
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
helm upgrade -i falco falcosecurity/falco \
--namespace falco --create-namespace \
--set tty=true
Practical snippet โ custom Falco rule with namespace filter
- list: product_namespaces
items: [billing, checkout, identity]
- rule: Package manager in product namespace
desc: Detect apt, yum, apk, or dnf execution in product workloads
condition: >
spawned_process and container and
k8s.ns.name in (product_namespaces) and
proc.name in (apt, apt-get, apk, yum, dnf, rpm)
output: >
Package manager executed in product workload
(ns=%k8s.ns.name pod=%k8s.pod.name image=%container.image.repository cmd=%proc.cmdline)
priority: WARNING
tags: [container, drift, package-manager]
Practical snippet โ Tetragon-style policy shape
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: exec-shell
spec:
kprobes:
- call: "__x64_sys_execve"
selectors:
- matchArgs:
- index: 0
operator: Prefix
values:
- "/bin/bash"
- "/bin/sh"
Use this as a mental model for fine-grained execution tracing. Adjust to your cluster and Tetragon version.
Practical snippet โ route Falco to a webhook
falcosidekick:
enabled: true
config:
webhook:
address: http://event-router.security.svc.cluster.local/falco
Practical snippet โ correlate runtime and cloud
Runtime alert: unexpected shell in container
Questions to ask next:
- was the pod recently redeployed?
- did a privileged role or human actor change the deployment?
- did an image digest change unexpectedly?
- did a secret or service account binding change near the same time?
Use CloudTrail, Kubernetes audit logs, deployment history, and image registry evidence to answer those questions.
What to alert on first
Start with high-confidence patterns:
- unexpected shell in production container;
- package manager execution in product namespace;
- write under sensitive config paths;
- outbound connection from a binary that should not make network calls;
- exec into a workload from an unusual identity or admin path;
- new privileged workload or excessive capabilities.
What not to do first
Do not start with:
- every shell event in every namespace;
- every file write everywhere;
- rules with no team ownership;
- auto-remediation for low-confidence behaviors.
Investigation-ready data to keep
For each alert, preserve:
- rule name and condition matched;
- pod, namespace, container name, image digest;
- command line, parent process, user, capability context;
- node, cluster, and time window;
- link to correlated audit-log query or dashboard.
Legacy notes
Older runtime-detection guidance often used:
- host IDS/IPS language;
- โcontainer runtime defenseโ as a broad product category;
- more manual driver handling and manual config editing.
Those ideas are still useful, but the better current framing is:
- runtime sensor + cloud logs + K8s audit + investigation workflow;
- policy and routing owned by real teams;
- install and update through package managers, Helm, or operator-friendly flows.
Related pages
- ๐ฆ Falco for Runtime Detection โ Practical Guide, Legacy Notes, and 2026 Patterns
- ๐ฏ High-Signal Detection Patterns and SIEM Examples
- ๐งญ Runtime Investigation Playbook for Kubernetes and Containers
- โธ๏ธ Kubernetes Hardening
---Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.