๐ Internal PKI for Microservices โ mTLS, Certificate Automation, and Trust Distribution
Intro: Once a platform has many services, containers, and clusters, โcopy a self-signed certificate into each serviceโ stops being a real security strategy. The operational problem becomes certificate issuance, trust distribution, rotation, and revocation at scale. This page gives a practical path for building an internal PKI for service-to-service TLS and mTLS without turning the KB into a full PKI textbook.
What this page includes
- when an internal PKI is worth the operational cost;
- the most practical open-source and commercial options;
- a recommended CA hierarchy for microservices;
- step-by-step implementation guidance for Kubernetes-heavy and mixed environments;
- starter snippets for cert-manager, trust-manager, and Vault PKI.
What problem this solves
You need an internal PKI when services must:
- encrypt service-to-service traffic;
- mutually authenticate peers, not just encrypt a channel;
- rotate certificates automatically without manual redeploy work;
- revoke or replace compromised credentials quickly;
- keep trust roots and leaf certificates under central control.
Typical drivers:
- many east-west calls between services;
- zero-trust or mTLS programs;
- multi-cluster Kubernetes platforms;
- regulated environments where certificate ownership and rotation evidence matter;
- service mesh or workload-identity programs.
Do not start with leaf certificates โ start with operating model
Before choosing tools, decide:
- identity model โ DNS names, service names, workload identity, or SPIFFE IDs;
- topology โ one cluster, many clusters, mixed VM + Kubernetes, or hybrid cloud;
- certificate lifetime โ long-lived leaf certs make operations easier but security worse;
- trust distribution method โ how every service gets the CA bundle;
- enforcement point โ application code, sidecar proxy, ingress / gateway, or service mesh.
If these decisions are vague, the PKI will become fragile quickly.
Recommended hierarchy for an internal PKI
For most organizations, the practical model is:
- offline root CA;
- online intermediate CA(s) for issuance;
- short-lived leaf certificates for workloads;
- separate trust bundle distribution for roots / intermediates.
Why this hierarchy works better than โone self-signed cert per serviceโ
Because it separates:
- trust anchor lifetime from workload certificate lifetime;
- emergency replacement from normal renewal;
- CA key protection from day-to-day service deployment.
Practical options โ open source and commercial
Option 1 โ cert-manager + trust-manager
Best fit when:
- workloads are mainly on Kubernetes;
- teams already manage secrets and ingress in cluster-native ways;
- you want Kubernetes-native renewal and trust-bundle distribution.
Why teams choose it:
- clean Kubernetes API model;
- automatic renewal with
Certificateresources; - easy bootstrap path from self-signed root to CA issuer;
- trust-manager distributes CA bundles across namespaces and workloads.
Trade-offs:
- strongest in Kubernetes, weaker as a general-purpose PKI for mixed estates;
- you still need to think through root/intermediate protection and disaster recovery;
- service-to-service identity semantics remain your responsibility unless combined with a mesh or SPIFFE-based model.
Option 2 โ Smallstep step-ca
Best fit when:
- you want an internal CA beyond Kubernetes only;
- you want ACME-friendly automation for servers, gateways, and services;
- you want a lightweight private CA with relatively low operational complexity.
Why teams choose it:
- purpose-built for automated private X.509 and SSH issuance;
- good support for short-lived certificates;
- useful when services or edge proxies can enroll via ACME or other supported provisioners;
- good stepping stone from โmanual certsโ to automated certificate lifecycle.
Trade-offs:
- still a CA you must operate and protect;
- trust distribution remains a platform task;
- less Kubernetes-native than cert-manager for in-cluster object workflows.
Option 3 โ HashiCorp Vault PKI
Best fit when:
- you already run Vault for secrets or strong authn/authz workflows;
- you want dynamic issuance, short-lived certs, and policy-driven roles;
- you need more enterprise-grade control over issuance, revocation, and multi-issuer rotation.
Why teams choose it:
- dynamic X.509 issuance through the PKI engine;
- short TTLs work well for service certificates;
- good fit when services already authenticate to Vault;
- can centralize PKI and secret workflows in one control plane.
Trade-offs:
- heavier to operate than a narrow CA-only solution;
- application or platform enrollment patterns must be designed carefully;
- Kubernetes distribution is good, but not as โnative object firstโ as cert-manager.
Option 4 โ SPIRE / SPIFFE
Best fit when:
- the real requirement is workload identity, not just certificates;
- the environment is dynamic and heterogeneous;
- you want workload-attested identities and automated mTLS without manually reasoning about each private key and CSR flow.
Why teams choose it:
- identities are issued to workloads based on attestation;
- short-lived SVIDs fit service-to-service auth very well;
- good choice for platform teams building zero-trust service identity.
Trade-offs:
- more architectural than โjust run a CAโ;
- stronger fit for platform engineering than for quick certificate file distribution;
- application teams must understand SPIFFE / Workload API or rely on a service mesh or proxy integration.
Commercial examples worth knowing
| Product | Where it fits |
|---|---|
| Smallstep Certificate Manager | managed / hosted version of the Smallstep model for teams that want less CA operations overhead |
| Venafi Control Plane | enterprise machine identity management, policy, lifecycle, discovery, and governance across many environments |
| DigiCert Trust Lifecycle Manager | CA-agnostic certificate inventory, lifecycle, workflow, and private PKI / trust management at enterprise scale |
| HCP Vault / Vault Enterprise | good when Vault is already strategic and you want PKI plus broader secrets / identity workflows |
What to choose in practice
If you are mostly on Kubernetes
Start with cert-manager + trust-manager.
If you need a general internal CA across VMs, containers, and gateways
Start with step-ca or Vault PKI depending on whether you need a focused CA or a broader secrets platform.
If you need first-class workload identity for dynamic service fleets
Evaluate SPIRE / SPIFFE, often together with a mesh or proxy layer.
Service mesh note
If you already run a mesh such as Istio or Linkerd, the easiest way to encrypt east-west traffic is often to let the mesh manage workload certificates and mTLS.
That does not remove the PKI problem. It shifts it to:
- who signs workload certificates;
- how the trust anchor rotates;
- whether the mesh uses self-signed roots or plugs into your own CA.
A common mistake is to assume โwe enabled the mesh, therefore PKI is solved forever.โ It is not.
Step-by-step implementation model
Step 1 โ define service identity format
Decide what identities look like.
Typical choices:
- DNS SANs like
service-a.namespace.svc.cluster.local; - external/internal FQDNs for gateway or VM services;
- SPIFFE IDs like
spiffe://company.internal/ns/payments/sa/api.
Do this before automation, otherwise you will bake inconsistent identity into every certificate.
Step 2 โ create an offline root and an online intermediate
Baseline guidance:
- root key offline or otherwise strongly protected;
- intermediate used for routine issuance;
- do not let applications or normal deployment automation talk to the root.
This lets you:
- rotate intermediates without replacing the entire trust model;
- issue short-lived workload certs at scale;
- reduce blast radius if the online issuance tier is compromised.
Step 3 โ automate enrollment, do not hand-copy certificates
Use one of these patterns:
- Kubernetes
Certificateobjects via cert-manager; - ACME enrollment against step-ca or another CA;
- Vault PKI roles and API / agent-based retrieval;
- SPIRE agent workload attestation and SVID issuance.
Manual copy-and-paste of PEM files does not scale and makes rotation brittle.
Step 4 โ distribute trust separately from leaf certificates
Every workload needs the CA bundle that validates peers.
Common patterns:
- a ConfigMap / Secret mounted to workloads;
- OS trust store update in VM images;
- trust-manager bundles in Kubernetes;
- mesh / sidecar distribution.
Do not hide the trust bundle inside one application image and forget it. Trust updates must be operable.
Step 5 โ prefer short-lived leaf certificates
For service identities, short-lived certificates are usually better than long-lived ones.
Why:
- less revocation dependence;
- lower value if a private key leaks;
- easier to reason about automatic renewal than about annual emergency replacements.
Practical bias:
- leaf certs short-lived and auto-renewed;
- intermediates medium-lived with planned rotation;
- root long-lived but rarely touched.
Step 6 โ plan revocation and replacement
Even if you prefer short TTLs, you still need a plan for:
- compromised node or pod credentials;
- stolen CA-issued leaf private keys;
- intermediate replacement;
- trust bundle overlap during rotation.
If you do not know how to revoke, replace, and redistribute trust under stress, the PKI is not operationally ready.
Step 7 โ enforce TLS and mTLS at the right layer
Choices:
- in application runtime;
- in reverse proxy or sidecar;
- in service mesh;
- at ingress / gateway only.
For many microservice environments, the cleanest model is:
- mTLS for east-west service traffic;
- separate ingress TLS for north-south traffic;
- application authorization still done above transport identity.
TLS proves channel and peer identity. It does not replace authorization.
Kubernetes-first practical path
A. Bootstrap a root with a self-signed issuer
Use a self-signed issuer only to create the initial root.
See: cert-manager root / CA bootstrap starter
This starter shows:
- a bootstrap self-signed
ClusterIssuer; - a root CA
Certificate; - a CA-backed issuer for normal leaf issuance;
- an example leaf certificate for an internal service.
B. Distribute trust with trust-manager
See: trust-manager private CA bundle starter
This is the practical missing piece many teams forget. Issuing leaf certs is only half of the problem; services also need the right trust bundle.
C. Mount certificates to workloads or terminate via sidecars / ingress
Patterns:
- mount
tls.crt,tls.key, and CA bundle into the pod; - configure the service runtime to require and verify client certificates for mTLS;
- or let a sidecar / mesh terminate and present workload identity.
Vault PKI practical path
See: Vault PKI bootstrap and issuance starter
This starter demonstrates the flow, not a full HA production deployment:
- generate root CA material;
- create an intermediate CSR;
- sign it with the root;
- configure a role for service issuance;
- issue a workload certificate with short TTL.
This is a strong fit when applications can authenticate to Vault or when platform automation can fetch and rotate certificates centrally.
Smallstep / step-ca practical path
See: step-ca containerized starter
Use this when you want:
- a lighter-weight private CA than a full secrets platform;
- ACME-driven issuance for internal gateways, proxies, and services;
- a cleaner path from โhand-managed certsโ to automated private PKI.
Example runtime configuration ideas
Service runtime pattern
Every service that terminates mTLS needs:
- a server certificate and private key;
- a trust bundle for peer validation;
- hostname / identity verification rules;
- safe reload or restart strategy when certificates renew.
Gateway / proxy pattern
For many teams, it is easier to terminate and verify mTLS in:
- Envoy;
- NGINX;
- HAProxy;
- service mesh sidecars.
This reduces the amount of application code that directly handles certificate files and trust stores.
Common mistakes
- using one long-lived self-signed cert everywhere;
- keeping the same root and same intermediate forever;
- distributing leaf certs but forgetting trust-bundle automation;
- storing private keys in images or source control;
- relying on revocation only, with very long certificate lifetimes;
- enabling mTLS transport but keeping authorization weak or implicit;
- assuming the meshโs default self-signed setup is production-ready forever.
Fast decision checklist
Choose cert-manager + trust-manager when:
- you are mostly on Kubernetes;
- you want Kubernetes-native certificates and trust bundles.
Choose step-ca when:
- you want a general private CA with relatively low complexity;
- ACME-based automation is attractive.
Choose Vault PKI when:
- Vault already exists or policy-driven issuance matters more than K8s-native UX.
Choose SPIRE when:
- workload identity and attestation are the real requirement.
Read next
- Service-to-Service Auth, Webhooks, and Event-Driven Security
- Zero-Trust Egress and Private Connectivity Patterns
- Workload Federation and Non-Human Identities
- Vault Installation, HA, and Automation Pack
- Container and Kubernetes Security
Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.