PS Product SecurityKnowledge Base

๐Ÿ” Internal PKI for Microservices โ€” mTLS, Certificate Automation, and Trust Distribution

Intro: Once a platform has many services, containers, and clusters, โ€œcopy a self-signed certificate into each serviceโ€ stops being a real security strategy. The operational problem becomes certificate issuance, trust distribution, rotation, and revocation at scale. This page gives a practical path for building an internal PKI for service-to-service TLS and mTLS without turning the KB into a full PKI textbook.

What this page includes

  • when an internal PKI is worth the operational cost;
  • the most practical open-source and commercial options;
  • a recommended CA hierarchy for microservices;
  • step-by-step implementation guidance for Kubernetes-heavy and mixed environments;
  • starter snippets for cert-manager, trust-manager, and Vault PKI.

What problem this solves

You need an internal PKI when services must:

  • encrypt service-to-service traffic;
  • mutually authenticate peers, not just encrypt a channel;
  • rotate certificates automatically without manual redeploy work;
  • revoke or replace compromised credentials quickly;
  • keep trust roots and leaf certificates under central control.

Typical drivers:

  • many east-west calls between services;
  • zero-trust or mTLS programs;
  • multi-cluster Kubernetes platforms;
  • regulated environments where certificate ownership and rotation evidence matter;
  • service mesh or workload-identity programs.

Do not start with leaf certificates โ€” start with operating model

Before choosing tools, decide:

  1. identity model โ€” DNS names, service names, workload identity, or SPIFFE IDs;
  2. topology โ€” one cluster, many clusters, mixed VM + Kubernetes, or hybrid cloud;
  3. certificate lifetime โ€” long-lived leaf certs make operations easier but security worse;
  4. trust distribution method โ€” how every service gets the CA bundle;
  5. enforcement point โ€” application code, sidecar proxy, ingress / gateway, or service mesh.

If these decisions are vague, the PKI will become fragile quickly.

For most organizations, the practical model is:

  • offline root CA;
  • online intermediate CA(s) for issuance;
  • short-lived leaf certificates for workloads;
  • separate trust bundle distribution for roots / intermediates.
flowchart TD ROOT[Offline Root CA\nrarely used] INT1[Online Intermediate CA\ncluster / environment / region] INT2[Optional second Intermediate\nfor rotation or another environment] TRUST[Trust bundle distribution\nConfigMap / secret / image / system trust] SVC1[Service A] SVC2[Service B] SVC3[Service C] ROOT --> INT1 ROOT --> INT2 INT1 --> SVC1 INT1 --> SVC2 INT1 --> SVC3 ROOT --> TRUST INT1 --> TRUST TRUST --> SVC1 TRUST --> SVC2 TRUST --> SVC3

Why this hierarchy works better than โ€œone self-signed cert per serviceโ€

Because it separates:

  • trust anchor lifetime from workload certificate lifetime;
  • emergency replacement from normal renewal;
  • CA key protection from day-to-day service deployment.

Practical options โ€” open source and commercial

Option 1 โ€” cert-manager + trust-manager

Best fit when:

  • workloads are mainly on Kubernetes;
  • teams already manage secrets and ingress in cluster-native ways;
  • you want Kubernetes-native renewal and trust-bundle distribution.

Why teams choose it:

  • clean Kubernetes API model;
  • automatic renewal with Certificate resources;
  • easy bootstrap path from self-signed root to CA issuer;
  • trust-manager distributes CA bundles across namespaces and workloads.

Trade-offs:

  • strongest in Kubernetes, weaker as a general-purpose PKI for mixed estates;
  • you still need to think through root/intermediate protection and disaster recovery;
  • service-to-service identity semantics remain your responsibility unless combined with a mesh or SPIFFE-based model.

Option 2 โ€” Smallstep step-ca

Best fit when:

  • you want an internal CA beyond Kubernetes only;
  • you want ACME-friendly automation for servers, gateways, and services;
  • you want a lightweight private CA with relatively low operational complexity.

Why teams choose it:

  • purpose-built for automated private X.509 and SSH issuance;
  • good support for short-lived certificates;
  • useful when services or edge proxies can enroll via ACME or other supported provisioners;
  • good stepping stone from โ€œmanual certsโ€ to automated certificate lifecycle.

Trade-offs:

  • still a CA you must operate and protect;
  • trust distribution remains a platform task;
  • less Kubernetes-native than cert-manager for in-cluster object workflows.

Option 3 โ€” HashiCorp Vault PKI

Best fit when:

  • you already run Vault for secrets or strong authn/authz workflows;
  • you want dynamic issuance, short-lived certs, and policy-driven roles;
  • you need more enterprise-grade control over issuance, revocation, and multi-issuer rotation.

Why teams choose it:

  • dynamic X.509 issuance through the PKI engine;
  • short TTLs work well for service certificates;
  • good fit when services already authenticate to Vault;
  • can centralize PKI and secret workflows in one control plane.

Trade-offs:

  • heavier to operate than a narrow CA-only solution;
  • application or platform enrollment patterns must be designed carefully;
  • Kubernetes distribution is good, but not as โ€œnative object firstโ€ as cert-manager.

Option 4 โ€” SPIRE / SPIFFE

Best fit when:

  • the real requirement is workload identity, not just certificates;
  • the environment is dynamic and heterogeneous;
  • you want workload-attested identities and automated mTLS without manually reasoning about each private key and CSR flow.

Why teams choose it:

  • identities are issued to workloads based on attestation;
  • short-lived SVIDs fit service-to-service auth very well;
  • good choice for platform teams building zero-trust service identity.

Trade-offs:

  • more architectural than โ€œjust run a CAโ€;
  • stronger fit for platform engineering than for quick certificate file distribution;
  • application teams must understand SPIFFE / Workload API or rely on a service mesh or proxy integration.

Commercial examples worth knowing

Product Where it fits
Smallstep Certificate Manager managed / hosted version of the Smallstep model for teams that want less CA operations overhead
Venafi Control Plane enterprise machine identity management, policy, lifecycle, discovery, and governance across many environments
DigiCert Trust Lifecycle Manager CA-agnostic certificate inventory, lifecycle, workflow, and private PKI / trust management at enterprise scale
HCP Vault / Vault Enterprise good when Vault is already strategic and you want PKI plus broader secrets / identity workflows

What to choose in practice

If you are mostly on Kubernetes

Start with cert-manager + trust-manager.

If you need a general internal CA across VMs, containers, and gateways

Start with step-ca or Vault PKI depending on whether you need a focused CA or a broader secrets platform.

If you need first-class workload identity for dynamic service fleets

Evaluate SPIRE / SPIFFE, often together with a mesh or proxy layer.

Service mesh note

If you already run a mesh such as Istio or Linkerd, the easiest way to encrypt east-west traffic is often to let the mesh manage workload certificates and mTLS.

That does not remove the PKI problem. It shifts it to:

  • who signs workload certificates;
  • how the trust anchor rotates;
  • whether the mesh uses self-signed roots or plugs into your own CA.

A common mistake is to assume โ€œwe enabled the mesh, therefore PKI is solved forever.โ€ It is not.

Step-by-step implementation model

Step 1 โ€” define service identity format

Decide what identities look like.

Typical choices:

  • DNS SANs like service-a.namespace.svc.cluster.local;
  • external/internal FQDNs for gateway or VM services;
  • SPIFFE IDs like spiffe://company.internal/ns/payments/sa/api.

Do this before automation, otherwise you will bake inconsistent identity into every certificate.

Step 2 โ€” create an offline root and an online intermediate

Baseline guidance:

  • root key offline or otherwise strongly protected;
  • intermediate used for routine issuance;
  • do not let applications or normal deployment automation talk to the root.

This lets you:

  • rotate intermediates without replacing the entire trust model;
  • issue short-lived workload certs at scale;
  • reduce blast radius if the online issuance tier is compromised.

Step 3 โ€” automate enrollment, do not hand-copy certificates

Use one of these patterns:

  • Kubernetes Certificate objects via cert-manager;
  • ACME enrollment against step-ca or another CA;
  • Vault PKI roles and API / agent-based retrieval;
  • SPIRE agent workload attestation and SVID issuance.

Manual copy-and-paste of PEM files does not scale and makes rotation brittle.

Step 4 โ€” distribute trust separately from leaf certificates

Every workload needs the CA bundle that validates peers.

Common patterns:

  • a ConfigMap / Secret mounted to workloads;
  • OS trust store update in VM images;
  • trust-manager bundles in Kubernetes;
  • mesh / sidecar distribution.

Do not hide the trust bundle inside one application image and forget it. Trust updates must be operable.

Step 5 โ€” prefer short-lived leaf certificates

For service identities, short-lived certificates are usually better than long-lived ones.

Why:

  • less revocation dependence;
  • lower value if a private key leaks;
  • easier to reason about automatic renewal than about annual emergency replacements.

Practical bias:

  • leaf certs short-lived and auto-renewed;
  • intermediates medium-lived with planned rotation;
  • root long-lived but rarely touched.

Step 6 โ€” plan revocation and replacement

Even if you prefer short TTLs, you still need a plan for:

  • compromised node or pod credentials;
  • stolen CA-issued leaf private keys;
  • intermediate replacement;
  • trust bundle overlap during rotation.

If you do not know how to revoke, replace, and redistribute trust under stress, the PKI is not operationally ready.

Step 7 โ€” enforce TLS and mTLS at the right layer

Choices:

  • in application runtime;
  • in reverse proxy or sidecar;
  • in service mesh;
  • at ingress / gateway only.

For many microservice environments, the cleanest model is:

  • mTLS for east-west service traffic;
  • separate ingress TLS for north-south traffic;
  • application authorization still done above transport identity.

TLS proves channel and peer identity. It does not replace authorization.

Kubernetes-first practical path

A. Bootstrap a root with a self-signed issuer

Use a self-signed issuer only to create the initial root.

See: cert-manager root / CA bootstrap starter

This starter shows:

  • a bootstrap self-signed ClusterIssuer;
  • a root CA Certificate;
  • a CA-backed issuer for normal leaf issuance;
  • an example leaf certificate for an internal service.

B. Distribute trust with trust-manager

See: trust-manager private CA bundle starter

This is the practical missing piece many teams forget. Issuing leaf certs is only half of the problem; services also need the right trust bundle.

C. Mount certificates to workloads or terminate via sidecars / ingress

Patterns:

  • mount tls.crt, tls.key, and CA bundle into the pod;
  • configure the service runtime to require and verify client certificates for mTLS;
  • or let a sidecar / mesh terminate and present workload identity.

Vault PKI practical path

See: Vault PKI bootstrap and issuance starter

This starter demonstrates the flow, not a full HA production deployment:

  • generate root CA material;
  • create an intermediate CSR;
  • sign it with the root;
  • configure a role for service issuance;
  • issue a workload certificate with short TTL.

This is a strong fit when applications can authenticate to Vault or when platform automation can fetch and rotate certificates centrally.

Smallstep / step-ca practical path

See: step-ca containerized starter

Use this when you want:

  • a lighter-weight private CA than a full secrets platform;
  • ACME-driven issuance for internal gateways, proxies, and services;
  • a cleaner path from โ€œhand-managed certsโ€ to automated private PKI.

Example runtime configuration ideas

Service runtime pattern

Every service that terminates mTLS needs:

  • a server certificate and private key;
  • a trust bundle for peer validation;
  • hostname / identity verification rules;
  • safe reload or restart strategy when certificates renew.

Gateway / proxy pattern

For many teams, it is easier to terminate and verify mTLS in:

  • Envoy;
  • NGINX;
  • HAProxy;
  • service mesh sidecars.

This reduces the amount of application code that directly handles certificate files and trust stores.

Common mistakes

  • using one long-lived self-signed cert everywhere;
  • keeping the same root and same intermediate forever;
  • distributing leaf certs but forgetting trust-bundle automation;
  • storing private keys in images or source control;
  • relying on revocation only, with very long certificate lifetimes;
  • enabling mTLS transport but keeping authorization weak or implicit;
  • assuming the meshโ€™s default self-signed setup is production-ready forever.

Fast decision checklist

Choose cert-manager + trust-manager when:

  • you are mostly on Kubernetes;
  • you want Kubernetes-native certificates and trust bundles.

Choose step-ca when:

  • you want a general private CA with relatively low complexity;
  • ACME-based automation is attractive.

Choose Vault PKI when:

  • Vault already exists or policy-driven issuance matters more than K8s-native UX.

Choose SPIRE when:

  • workload identity and attestation are the real requirement.

Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.