📉 DevSecOps Metrics: DORA, AppSec Coverage, and Security Debt

Intro: Security programs often drown in raw finding counts. A better operating model combines delivery-flow metrics, control-coverage metrics, and security-debt metrics so that engineering leaders can see whether the organization is simultaneously getting faster, safer, and easier to reason about.

What this page includes

how to interpret the four DORA metrics in a Product Security context

how to measure AppSec coverage without pretending every system is equally important

how to track security debt at the defect and service level

how to translate these signals into manager and director reporting

Why this page exists

A Product Security team usually gets asked two hard questions:

Are we making delivery safer without slowing it down too much?
Are we paying down real security debt, or just moving findings around dashboards?

DORA-style delivery metrics help answer the first question.
Coverage and defect-debt metrics help answer the second.

The useful trick is to read them together, not in isolation.

A practical metric stack

Use three layers:

Layer	What it answers	Typical audience
Delivery flow	Are changes moving safely through the pipeline?	engineering managers, platform leads
Control coverage	Which systems actually use the expected security practices?	AppSec leads, platform teams, directors
Security debt	Where is meaningful exposure accumulating faster than remediation?	product leads, directors, governance reviewers

DORA metrics in a Product Security context

The four DORA metrics were not created as “security metrics,” but they are still useful because they show whether security is operating inside delivery flow or outside it.

DORA metric	Classic meaning	Product Security interpretation	Common misuse
Deployment Frequency	how often the team deploys	whether teams can ship fixes and control changes quickly	treating lower frequency as automatically “safer”
Lead Time for Changes	time from commit to production	how quickly code, configuration, and security fixes move through the system	hiding security bottlenecks inside generic pipeline delay
Change Failure Rate	percent of changes that degrade service	how often releases trigger rollback, exposure, or emergency exception due to weak controls	counting only outages and ignoring security-driven rollback or hotfix events
Time to Restore Service	time to recover after failure	how fast the team can recover from a broken release, bad configuration, or security-caused production issue	treating MTTR as only an SRE metric

What a security team should do with DORA

Use DORA to ask these questions:

Do security checks create useful friction or just hidden queue time?
Can the team ship a remediation in hours or does it still need a special release ceremony?
Are security-related breakages concentrated in a few services, modules, or teams?
Is recovery limited by release process, by missing observability, or by poor ownership?

Security overlays for DORA

DORA alone is not enough. Add a small overlay.

Overlay metric	What it helps explain
Percent of releases with required security evidence attached	whether release flow is audit-friendly and reviewable
Gate bypass rate	whether teams routinely route around controls
Median remediation lead time for exploitable findings	whether fast delivery actually helps reduce exposure
Percent of hotfixes caused by security control gaps	whether preventive controls are failing upstream
Mean time to rotate exposed secrets	whether secret detection and response are operationally real

AppSec coverage: two measurements that matter

A lot of teams say “we have coverage” when they really mean “some scanners run somewhere.”

Use two different measurements.

1) Asset coverage

How much of the important estate is actually covered?

Examples:

percent of tier-1 repos with SAST, secret scanning, and dependency scanning
percent of tier-1 services with image scanning and runtime owner defined
percent of public APIs with contract linting and authz review
percent of critical cloud accounts or subscriptions with posture review enabled

2) Practice coverage

How many expected security practices are actually present per system?

Examples:

threat model exists
ownership is defined
repo protection exists
CI checks are present
release evidence exists
image signing or provenance check exists
runtime logs are preserved
exception path exists

A system may have “some scanners” but still have weak practice coverage.

A simple AppSec coverage model

Use weighted coverage rather than flat coverage.

System tier	Weight	Why
Tier 1	5	internet-facing, regulated, business-critical, or customer-trust critical
Tier 2	3	important internal service or shared platform dependency
Tier 3	1	lower-risk or low-sensitivity service

Then score each service against a short list of practices.

Example practice list

repository secret scanning
SAST or equivalent code review automation
dependency or SBOM visibility
CI quality gate
release evidence
threat model or architecture review
owner and escalation path
logging and recovery notes

A weighted score is often more honest than “82% of systems covered” because it prevents tiny low-risk repos from dominating the dashboard.

Security debt at the defect level

Security debt is not just “number of open vulns.”
Track the debt in ways that show risk, age, and repair efficiency.

Useful views:

Metric	What it shows
Critical/high finding age by service tier	where risky backlog is aging in important systems
Weighted risk index of open findings	a rough score that combines severity, exploitability, and service criticality
Fix rate vs intake rate	whether the organization is burning debt down or accumulating it
Reopen rate / recurrence rate	whether fixes are durable
Mean time to triage	whether findings are being understood quickly enough
Debt added in new code	whether the team is shipping fresh problems while fixing old ones

A practical weighted-risk approach

Do not overcomplicate the math. A simple weighted index works:

Weighted Risk Index =
severity_weight × exploitability_weight × asset_criticality_weight

Example weights:

severity: critical=5, high=4, medium=2, low=1
exploitability: proven/likely=3, plausible=2, theoretical=1
asset criticality: tier1=3, tier2=2, tier3=1

This is not a law of nature. It is a prioritization aid.

How to read these metrics together

Good pattern

deployment frequency stays healthy;
remediation lead time drops;
coverage rises in tier-1 systems;
gate bypass rate is low;
weighted risk debt trends down.

That usually means security is becoming part of delivery.

Bad pattern

coverage rises only because more “easy” repos were added;
critical debt in tier-1 services grows older;
gate bypasses increase;
deployment slows but exposure does not meaningfully drop.

That usually means the program is becoming heavier, not better.

Suggested manager dashboard

A manager-friendly monthly view could include:

DORA four-pack for business-critical products
percent of tier-1 services with required security practices
weighted risk debt trend
median age of critical/high exploitable findings
gate bypass count and average time to resolution
secret exposure rate per 1,000 commits
top five repeat defect classes

Suggested director talking points

Directors usually need business translation more than scanner granularity.

Good statements:

“Security evidence is now attached to 90% of tier-1 releases.”
“Median age of exploitable high-risk findings in customer-facing services decreased from 28 days to 11 days.”
“Coverage growth this quarter came from onboarding tier-1 APIs and regulated repos, not from low-risk estate inflation.”
“We improved remediation flow without reducing deployment frequency.”

Common mistakes

using raw finding counts as the main KPI;
mixing all services into one unweighted average;
reporting “coverage” without defining coverage of what;
measuring only scanner activity instead of control adoption and debt trend;
hiding security-caused delivery friction inside generic pipeline metrics.