๐ DevSecOps Metrics: DORA, AppSec Coverage, and Security Debt
Intro: Security programs often drown in raw finding counts. A better operating model combines delivery-flow metrics, control-coverage metrics, and security-debt metrics so that engineering leaders can see whether the organization is simultaneously getting faster, safer, and easier to reason about.
What this page includes
- how to interpret the four DORA metrics in a Product Security context
- how to measure AppSec coverage without pretending every system is equally important
- how to track security debt at the defect and service level
- how to translate these signals into manager and director reporting
Why this page exists
A Product Security team usually gets asked two hard questions:
- Are we making delivery safer without slowing it down too much?
- Are we paying down real security debt, or just moving findings around dashboards?
DORA-style delivery metrics help answer the first question.
Coverage and defect-debt metrics help answer the second.
The useful trick is to read them together, not in isolation.
A practical metric stack
Use three layers:
| Layer | What it answers | Typical audience |
|---|---|---|
| Delivery flow | Are changes moving safely through the pipeline? | engineering managers, platform leads |
| Control coverage | Which systems actually use the expected security practices? | AppSec leads, platform teams, directors |
| Security debt | Where is meaningful exposure accumulating faster than remediation? | product leads, directors, governance reviewers |
DORA metrics in a Product Security context
The four DORA metrics were not created as โsecurity metrics,โ but they are still useful because they show whether security is operating inside delivery flow or outside it.
| DORA metric | Classic meaning | Product Security interpretation | Common misuse |
|---|---|---|---|
| Deployment Frequency | how often the team deploys | whether teams can ship fixes and control changes quickly | treating lower frequency as automatically โsaferโ |
| Lead Time for Changes | time from commit to production | how quickly code, configuration, and security fixes move through the system | hiding security bottlenecks inside generic pipeline delay |
| Change Failure Rate | percent of changes that degrade service | how often releases trigger rollback, exposure, or emergency exception due to weak controls | counting only outages and ignoring security-driven rollback or hotfix events |
| Time to Restore Service | time to recover after failure | how fast the team can recover from a broken release, bad configuration, or security-caused production issue | treating MTTR as only an SRE metric |
What a security team should do with DORA
Use DORA to ask these questions:
- Do security checks create useful friction or just hidden queue time?
- Can the team ship a remediation in hours or does it still need a special release ceremony?
- Are security-related breakages concentrated in a few services, modules, or teams?
- Is recovery limited by release process, by missing observability, or by poor ownership?
Security overlays for DORA
DORA alone is not enough. Add a small overlay.
| Overlay metric | What it helps explain |
|---|---|
| Percent of releases with required security evidence attached | whether release flow is audit-friendly and reviewable |
| Gate bypass rate | whether teams routinely route around controls |
| Median remediation lead time for exploitable findings | whether fast delivery actually helps reduce exposure |
| Percent of hotfixes caused by security control gaps | whether preventive controls are failing upstream |
| Mean time to rotate exposed secrets | whether secret detection and response are operationally real |
AppSec coverage: two measurements that matter
A lot of teams say โwe have coverageโ when they really mean โsome scanners run somewhere.โ
Use two different measurements.
1) Asset coverage
How much of the important estate is actually covered?
Examples:
- percent of tier-1 repos with SAST, secret scanning, and dependency scanning
- percent of tier-1 services with image scanning and runtime owner defined
- percent of public APIs with contract linting and authz review
- percent of critical cloud accounts or subscriptions with posture review enabled
2) Practice coverage
How many expected security practices are actually present per system?
Examples:
- threat model exists
- ownership is defined
- repo protection exists
- CI checks are present
- release evidence exists
- image signing or provenance check exists
- runtime logs are preserved
- exception path exists
A system may have โsome scannersโ but still have weak practice coverage.
A simple AppSec coverage model
Use weighted coverage rather than flat coverage.
| System tier | Weight | Why |
|---|---|---|
| Tier 1 | 5 | internet-facing, regulated, business-critical, or customer-trust critical |
| Tier 2 | 3 | important internal service or shared platform dependency |
| Tier 3 | 1 | lower-risk or low-sensitivity service |
Then score each service against a short list of practices.
Example practice list
- repository secret scanning
- SAST or equivalent code review automation
- dependency or SBOM visibility
- CI quality gate
- release evidence
- threat model or architecture review
- owner and escalation path
- logging and recovery notes
A weighted score is often more honest than โ82% of systems coveredโ because it prevents tiny low-risk repos from dominating the dashboard.
Security debt at the defect level
Security debt is not just โnumber of open vulns.โ
Track the debt in ways that show risk, age, and repair efficiency.
Useful views:
| Metric | What it shows |
|---|---|
| Critical/high finding age by service tier | where risky backlog is aging in important systems |
| Weighted risk index of open findings | a rough score that combines severity, exploitability, and service criticality |
| Fix rate vs intake rate | whether the organization is burning debt down or accumulating it |
| Reopen rate / recurrence rate | whether fixes are durable |
| Mean time to triage | whether findings are being understood quickly enough |
| Debt added in new code | whether the team is shipping fresh problems while fixing old ones |
A practical weighted-risk approach
Do not overcomplicate the math. A simple weighted index works:
Weighted Risk Index =
severity_weight ร exploitability_weight ร asset_criticality_weight
Example weights:
- severity: critical=5, high=4, medium=2, low=1
- exploitability: proven/likely=3, plausible=2, theoretical=1
- asset criticality: tier1=3, tier2=2, tier3=1
This is not a law of nature. It is a prioritization aid.
How to read these metrics together
Good pattern
- deployment frequency stays healthy;
- remediation lead time drops;
- coverage rises in tier-1 systems;
- gate bypass rate is low;
- weighted risk debt trends down.
That usually means security is becoming part of delivery.
Bad pattern
- coverage rises only because more โeasyโ repos were added;
- critical debt in tier-1 services grows older;
- gate bypasses increase;
- deployment slows but exposure does not meaningfully drop.
That usually means the program is becoming heavier, not better.
Suggested manager dashboard
A manager-friendly monthly view could include:
- DORA four-pack for business-critical products
- percent of tier-1 services with required security practices
- weighted risk debt trend
- median age of critical/high exploitable findings
- gate bypass count and average time to resolution
- secret exposure rate per 1,000 commits
- top five repeat defect classes
Suggested director talking points
Directors usually need business translation more than scanner granularity.
Good statements:
- โSecurity evidence is now attached to 90% of tier-1 releases.โ
- โMedian age of exploitable high-risk findings in customer-facing services decreased from 28 days to 11 days.โ
- โCoverage growth this quarter came from onboarding tier-1 APIs and regulated repos, not from low-risk estate inflation.โ
- โWe improved remediation flow without reducing deployment frequency.โ
Common mistakes
- using raw finding counts as the main KPI;
- mixing all services into one unweighted average;
- reporting โcoverageโ without defining coverage of what;
- measuring only scanner activity instead of control adoption and debt trend;
- hiding security-caused delivery friction inside generic pipeline metrics.