PS Product SecurityKnowledge Base

๐Ÿ“‰ DevSecOps Metrics: DORA, AppSec Coverage, and Security Debt

Intro: Security programs often drown in raw finding counts. A better operating model combines delivery-flow metrics, control-coverage metrics, and security-debt metrics so that engineering leaders can see whether the organization is simultaneously getting faster, safer, and easier to reason about.

What this page includes

  • how to interpret the four DORA metrics in a Product Security context
  • how to measure AppSec coverage without pretending every system is equally important
  • how to track security debt at the defect and service level
  • how to translate these signals into manager and director reporting

Why this page exists

A Product Security team usually gets asked two hard questions:

  1. Are we making delivery safer without slowing it down too much?
  2. Are we paying down real security debt, or just moving findings around dashboards?

DORA-style delivery metrics help answer the first question.
Coverage and defect-debt metrics help answer the second.

The useful trick is to read them together, not in isolation.

A practical metric stack

Use three layers:

Layer What it answers Typical audience
Delivery flow Are changes moving safely through the pipeline? engineering managers, platform leads
Control coverage Which systems actually use the expected security practices? AppSec leads, platform teams, directors
Security debt Where is meaningful exposure accumulating faster than remediation? product leads, directors, governance reviewers

DORA metrics in a Product Security context

The four DORA metrics were not created as โ€œsecurity metrics,โ€ but they are still useful because they show whether security is operating inside delivery flow or outside it.

DORA metric Classic meaning Product Security interpretation Common misuse
Deployment Frequency how often the team deploys whether teams can ship fixes and control changes quickly treating lower frequency as automatically โ€œsaferโ€
Lead Time for Changes time from commit to production how quickly code, configuration, and security fixes move through the system hiding security bottlenecks inside generic pipeline delay
Change Failure Rate percent of changes that degrade service how often releases trigger rollback, exposure, or emergency exception due to weak controls counting only outages and ignoring security-driven rollback or hotfix events
Time to Restore Service time to recover after failure how fast the team can recover from a broken release, bad configuration, or security-caused production issue treating MTTR as only an SRE metric

What a security team should do with DORA

Use DORA to ask these questions:

  • Do security checks create useful friction or just hidden queue time?
  • Can the team ship a remediation in hours or does it still need a special release ceremony?
  • Are security-related breakages concentrated in a few services, modules, or teams?
  • Is recovery limited by release process, by missing observability, or by poor ownership?

Security overlays for DORA

DORA alone is not enough. Add a small overlay.

Overlay metric What it helps explain
Percent of releases with required security evidence attached whether release flow is audit-friendly and reviewable
Gate bypass rate whether teams routinely route around controls
Median remediation lead time for exploitable findings whether fast delivery actually helps reduce exposure
Percent of hotfixes caused by security control gaps whether preventive controls are failing upstream
Mean time to rotate exposed secrets whether secret detection and response are operationally real

AppSec coverage: two measurements that matter

A lot of teams say โ€œwe have coverageโ€ when they really mean โ€œsome scanners run somewhere.โ€

Use two different measurements.

1) Asset coverage

How much of the important estate is actually covered?

Examples:

  • percent of tier-1 repos with SAST, secret scanning, and dependency scanning
  • percent of tier-1 services with image scanning and runtime owner defined
  • percent of public APIs with contract linting and authz review
  • percent of critical cloud accounts or subscriptions with posture review enabled

2) Practice coverage

How many expected security practices are actually present per system?

Examples:

  • threat model exists
  • ownership is defined
  • repo protection exists
  • CI checks are present
  • release evidence exists
  • image signing or provenance check exists
  • runtime logs are preserved
  • exception path exists

A system may have โ€œsome scannersโ€ but still have weak practice coverage.

A simple AppSec coverage model

Use weighted coverage rather than flat coverage.

System tier Weight Why
Tier 1 5 internet-facing, regulated, business-critical, or customer-trust critical
Tier 2 3 important internal service or shared platform dependency
Tier 3 1 lower-risk or low-sensitivity service

Then score each service against a short list of practices.

Example practice list

  • repository secret scanning
  • SAST or equivalent code review automation
  • dependency or SBOM visibility
  • CI quality gate
  • release evidence
  • threat model or architecture review
  • owner and escalation path
  • logging and recovery notes

A weighted score is often more honest than โ€œ82% of systems coveredโ€ because it prevents tiny low-risk repos from dominating the dashboard.

Security debt at the defect level

Security debt is not just โ€œnumber of open vulns.โ€
Track the debt in ways that show risk, age, and repair efficiency.

Useful views:

Metric What it shows
Critical/high finding age by service tier where risky backlog is aging in important systems
Weighted risk index of open findings a rough score that combines severity, exploitability, and service criticality
Fix rate vs intake rate whether the organization is burning debt down or accumulating it
Reopen rate / recurrence rate whether fixes are durable
Mean time to triage whether findings are being understood quickly enough
Debt added in new code whether the team is shipping fresh problems while fixing old ones

A practical weighted-risk approach

Do not overcomplicate the math. A simple weighted index works:

Weighted Risk Index =
severity_weight ร— exploitability_weight ร— asset_criticality_weight

Example weights:

  • severity: critical=5, high=4, medium=2, low=1
  • exploitability: proven/likely=3, plausible=2, theoretical=1
  • asset criticality: tier1=3, tier2=2, tier3=1

This is not a law of nature. It is a prioritization aid.

How to read these metrics together

Good pattern

  • deployment frequency stays healthy;
  • remediation lead time drops;
  • coverage rises in tier-1 systems;
  • gate bypass rate is low;
  • weighted risk debt trends down.

That usually means security is becoming part of delivery.

Bad pattern

  • coverage rises only because more โ€œeasyโ€ repos were added;
  • critical debt in tier-1 services grows older;
  • gate bypasses increase;
  • deployment slows but exposure does not meaningfully drop.

That usually means the program is becoming heavier, not better.

Suggested manager dashboard

A manager-friendly monthly view could include:

  • DORA four-pack for business-critical products
  • percent of tier-1 services with required security practices
  • weighted risk debt trend
  • median age of critical/high exploitable findings
  • gate bypass count and average time to resolution
  • secret exposure rate per 1,000 commits
  • top five repeat defect classes

Suggested director talking points

Directors usually need business translation more than scanner granularity.

Good statements:

  • โ€œSecurity evidence is now attached to 90% of tier-1 releases.โ€
  • โ€œMedian age of exploitable high-risk findings in customer-facing services decreased from 28 days to 11 days.โ€
  • โ€œCoverage growth this quarter came from onboarding tier-1 APIs and regulated repos, not from low-risk estate inflation.โ€
  • โ€œWe improved remediation flow without reducing deployment frequency.โ€

Common mistakes

  • using raw finding counts as the main KPI;
  • mixing all services into one unweighted average;
  • reporting โ€œcoverageโ€ without defining coverage of what;
  • measuring only scanner activity instead of control adoption and debt trend;
  • hiding security-caused delivery friction inside generic pipeline metrics.