AWS IAM and Role Design

Intro: This page treats IAM as one of the main architectural control planes in AWS. Good role design reduces blast radius, makes review easier, and prevents cloud access from turning into a hard-to-audit maze of standing privilege.

What this page includes

a role model for humans, workloads, automation, and break-glass access

practical design choices for trust policies, permissions boundaries, ABAC, and session controls

review questions that expose identity sprawl early

Working assumptions

long-lived credentials and role sprawl are signs of weak operating design, not unavoidable cloud reality

AWS IAM design is strongest when it starts with federation and temporary credentials, then narrows outward into purpose-built roles, policy guardrails, and auditable exceptions.

Design principle

Treat every role as a statement of three things:

who or what may assume it;
under which trust conditions;
what maximum action set is actually needed.

If any of those answers are vague, the role is usually too broad.

Role families worth separating

Role family	Typical caller	Security goal
Human access roles	engineers, platform operators, responders	short-lived interactive access with clear accountability
CI/CD automation roles	trusted delivery pipelines	narrowly scoped deployment and artifact actions
Workload runtime roles	applications and controllers	least-privilege service access without embedded keys
Admin platform roles	cloud platform owners	privileged configuration changes with tighter review
Break-glass roles	emergency responders	exceptional access with extra approval and logging

Recommended baseline

Federate humans first

Prefer identity-provider federation for people and issue temporary credentials through roles. Standing IAM users should be rare and justified.

Separate human, machine, and workload access

Do not reuse the same role family across interactive engineers, pipelines, and runtime workloads. Their trust conditions, audit expectations, and blast radius are different.

Design for small, named trust boundaries

Good examples include:

one runtime role per workload or tightly related workload set;
one deployment role per environment tier or platform function;
read-only discovery roles distinct from mutation-capable roles;
a separate break-glass path with tighter scrutiny.

Trust policy patterns

A trust policy is not boilerplate. It is the gate that decides which principal can even begin to request the permissions of a role.

Human access patterns

For human roles, strong patterns include:

federation from the corporate identity provider;
session duration aligned to the task, not the whole day by default;
clear role naming by environment and privilege level;
MFA and contextual restrictions where applicable;
source identity or session tagging to preserve attribution.

Workload access patterns

For workloads, make the caller identity explicit:

EKS workloads should prefer workload identity patterns such as IRSA rather than borrowing a node role;
serverless or service-native workloads should use the service’s native role attachment model;
CI/CD deploy roles should trust only the pipeline identity path that actually needs them.

Cross-account patterns

Cross-account access should make the trust boundary obvious:

specify exactly which principal or role path may assume the role;
use conditions when they materially narrow trust;
review the necessity of every wildcard in the trust relationship;
make external access auditable at the account and organization layers.

Permissions design patterns

Prefer role-per-boundary over “one giant platform role”

Broad reusable roles save short-term setup time but create long-term review failure. Instead, design by boundary:

repository or pipeline trust level;
workload identity;
environment tier;
business domain;
admin versus runtime action set.

Use permissions boundaries when delegating role creation

If teams can create or modify roles, permissions boundaries help define the maximum permissions those identities may ever receive, even if an identity-based policy is broader than intended.

Use ABAC where it simplifies scale, not where it hides complexity

ABAC can reduce policy sprawl when your tagging model is disciplined. It works best when:

principal tags come from a trusted identity source or controlled role design;
resource tags are required and reviewed;
service coverage for tag-based authorization is understood;
broad admin policies do not silently bypass the model.

Validate policies before attachment

Use policy validation and access analysis before production use. The point is not only grammar correctness. The point is to catch accidental broad access, weak conditions, and public or cross-account exposure paths early.

EKS and workload identity

If Kubernetes workloads need AWS access, a common target state is:

bind a dedicated Kubernetes service account to the workload;
map that service account to a dedicated IAM role;
scope the role to the workload’s actual AWS calls;
keep node roles smaller because they no longer need to carry application permissions.

This keeps runtime identity closer to workload ownership and makes review more understandable.

Example review checklist

Ask these questions in every IAM design review:

Are humans using federation and temporary credentials by default?
Which roles still rely on long-lived credentials or standing users?
Does each role have a single clear purpose and owner?
Are trust policies tighter than the permission policies they protect?
Can a pipeline or workload assume a role that is broader than its business function?
Where are permissions boundaries used, and where should they be?
Which roles use ABAC or session tags, and who controls the tag source?
Are workload identities separated from node or host identities?
Is there a distinct break-glass path with logging and review?
Has policy validation or access analysis been performed before rollout?

Common anti-patterns

keeping permanent IAM users because migration to federation feels inconvenient;
using the same admin-like role for people, pipelines, and workloads;
attaching broad managed policies first and never narrowing them later;
letting node roles carry application permissions in EKS when workload identity is available;
writing trust policies with weak principals or broad wildcard assumptions;
treating tags as ABAC truth when the tag assignment process itself is untrusted.

Example role catalog

Example role	Intended caller	Typical scope
`eng-readonly-prod`	humans	read-only production inspection
`platform-admin-nonprod`	cloud platform owners	controlled admin changes outside production
`gitlab-deploy-prod-service-a`	protected pipeline lane	deploy one service to one environment
`eks-sa-payments-writer`	payments workload service account	scoped access to required AWS services only
`breakglass-security-admin`	emergency response only	time-bound exceptional admin access