Serverless Doesn't Mean Low Blast Radius

The security case for serverless is usually framed as an absence. No persistent instances, no OS to patch, no host for an attacker to establish a foothold on. The shorthand version is that there's nothing to exploit. That framing is close enough to be convincing and wrong enough to get you breached.

There's plenty to exploit. You just moved it.

Capital One in 2019 wasn't a compute problem. It was an identity problem. The server-side request forgery (SSRF) vulnerability gave the attacker access to the EC2 metadata service, which gave them credentials for an overpermissioned IAM role. That role could read S3 buckets broadly across the account. Over 100 million customer records were exfiltrated. The EC2 instance was incidental. The IAM role was the blast radius. You can eliminate every EC2 instance in your environment and rebuild the same blast radius with Lambda in an afternoon.

Serverless removes the compute layer you were patching and segmenting. It does not remove the layer that determines how much damage a compromised workload can do. It concentrates that risk entirely on identity.

What Serverless Actually Changes

The traditional server threat model has three pillars. Patch the OS, segment the network, monitor the host. Serverless invalidates all three and replaces them with a single question. What can this execution identity do?

Lambda functions don't run persistent OS instances you patch. They have no network position to segment. A function inside a VPC and a function outside one are equally capable of exfiltrating credentials to an attacker-controlled endpoint. There's no host to monitor; the function runs for milliseconds and is gone. Every traditional control you built your serverless pitch around either doesn't apply or doesn't help.

What does apply is the IAM role or managed identity your function runs as. That identity determines everything.

How Lambda Credentials Actually Work

Lambda does not use the EC2 metadata service, which makes it a common source of false confidence after an EC2 migration. Lambda has never used the instance metadata service (IMDS). Credentials are injected directly as environment variables at function startup.

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN

These are temporary STS tokens, valid for up to one hour by default. Once exfiltrated, they are usable from anywhere with internet access. The function that produced them doesn't need to be running.

The attack chain for a vulnerable Lambda function is three steps. An attacker finds an SSRF vulnerability. A URL parameter the function fetches without validation, a webhook handler that follows redirects, a template renderer that makes outbound requests. They cause the function to read /proc/self/environ or hit the Lambda Runtime API at http://localhost:9001/2018-06-01/runtime/invocation/next. Both expose the environment, which includes the credentials. The attacker exfiltrates the output to a server they control. They now have working AWS credentials tied to whatever IAM role the function runs as.

Lambda credential theft chain

SSRF found

Env vars read

Tokens sent out

Used from anywhere

IMDSv2, which is now default on new EC2 instances, is irrelevant here. Lambda has never used IMDS, so hardening IMDS doesn't change the Lambda attack surface at all.

Capital One belongs in this conversation not because Lambda and EC2 work the same way (they don't) but because the blast radius calculation is identical. Different runtime, different credential delivery mechanism, same consequence: overpermissioned role credentials in the hands of an attacker.

This Isn't a Lambda Problem

Lambda is the clearest example because the credential delivery is most visible, but the identity-as-blast-radius pattern applies across your entire serverless and container estate.

On AWS, Elastic Kubernetes Service (EKS) surfaces the same problem through a different path. EKS worker nodes are EC2 instances, and any pod running on a node can query the EC2 IMDS endpoint to retrieve the node's IAM role credentials. All pods on the node share that role. Node roles accumulate permissions incrementally: ECR pull, CloudWatch, EBS attach, and anything else that came up during setup, with nothing driving a cleanup pass. IRSA (IAM Roles for Service Accounts) solves this by binding a role to a specific Kubernetes service account via OpenID Connect (OIDC), scoping credentials at the pod level. It requires explicit configuration per workload, which is where the scope creep re-enters.

The Lambda-to-RDS pattern creates a different version of the same exposure. Putting DB_HOST, DB_USER, and DB_PASSWORD directly in function environment variables means the attacker doesn't get a scoped STS token with a one-hour TTL. They get long-lived database credentials that remain valid until someone rotates them. RDS IAM database authentication solves this. The function generates a short-lived auth token at connection time. It requires application-side implementation where the function generates a short-lived auth token at connection time rather than reading a static credential.

On Azure, managed identity credentials are served via Azure IMDS at http://169.254.169.254/metadata/identity/oauth2/token with a Metadata: true header. An SSRF-vulnerable function can be redirected to an attacker-controlled server, which returns a redirect to the IMDS endpoint. The function follows the redirect, retrieves a standard OAuth bearer token, and includes it in the response. That token is usable from anywhere. The blast radius is determined by the managed identity's role assignment. The role Azure's quickstart documentation demonstrates is Contributor at subscription scope, meaning read and write on every resource in the subscription. If your team followed the quickstart, that's the role your function is running as.

On GCP, service account credentials are served via the metadata server at http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token, with a required Metadata-Flavor: Google header. That header is a partial mitigation. A naive SSRF can't retrieve the token because the function won't add the header automatically. It does not help when the attacker has code execution, when the function proxies requests and forwards headers, or when an open redirect exists on the metadata server. The default GCP compute service account, which Cloud Functions v2 uses unless explicitly overridden, frequently carries cloud-platform OAuth scope, meaning it can authenticate to every GCP API the project has enabled.

Three platforms, different credential delivery mechanisms, same root problem.

Why Least Privilege Drifts in Serverless

The standard answer to an overpermissioned role is to write a least-privilege policy. That's correct. The problem is that serverless makes it harder than it looks.

A long-running EC2 instance has a process you can profile. CloudWatch or a network tap shows you every API call the application makes. You build the policy from observed behavior. A Lambda function runs for 50 milliseconds and disappears. Its execution profile is scattered across CloudTrail logs that capture the data but require active querying to be useful. When a developer is setting up a new function, the fast path is to attach a managed policy that looks close enough (AmazonS3FullAccess, AmazonDynamoDBFullAccess) and move on. Nobody comes back to tighten it.

The solution pattern is the same regardless of platform. Enable audit logging, let the function run in production long enough to cover all its code paths, then write the policy from what the logs actually show. Use 30 to 90 days as your lookback window. Shorter windows miss infrequent paths like monthly batch jobs, error handlers, and on-call runbooks.

On AWS, IAM Access Analyzer automates exactly this. It reads CloudTrail and generates a least-privilege policy from observed API calls, turning a manual audit into a CLI command.

aws accessanalyzer start-policy-generation \
  --policy-generation-details '{"principalArn":"arn:aws:iam::ACCOUNT:role/ROLE_NAME"}' \
  --cloud-trail-details '{
    "trailArn":"arn:aws:cloudtrail:REGION:ACCOUNT:trail/TRAIL_NAME",
    "accessRole":"arn:aws:iam::ACCOUNT:role/AccessAnalyzerRole",
    "startTime":"2026-01-01T00:00:00Z"
  }'

The process is the same on all three platforms: collect audit logs for the identity, enumerate the operations it actually called, write a role that covers exactly that set. On AWS, IAM Access Analyzer runs this automatically. On GCP, IAM Recommender does the same using 90 days of Cloud Audit Logs. On Azure, the step is still manual: export Activity Log and Diagnostic Logs filtered by the managed identity's principal ID and derive the role from what you find. Defender for Cloud's CIEM capabilities can reduce that work if it is already licensed, but there is no native equivalent to what AWS and GCP offer out of the box.

The Ceiling: What Least Privilege Alone Can't Protect

Least-privilege policy generation answers one question. What should this execution identity be allowed to do? It produces a well-scoped policy attached to a specific role. It can drift. A developer in a hurry attaches AmazonS3FullAccess and the careful policy work done last quarter is undone in a deploy.

SCPs and permission boundaries answer a different question. What is the maximum any execution identity in this account or role can do, regardless of what's attached?

How the ceiling holds when least privilege drifts

Role drifts

SCP blocks escalation

Boundary caps movement

These controls are not a substitute for least privilege. A permission boundary set to s3:* on a role that only needs s3:GetObject on one bucket is still overpermissioned. But a permission boundary that excludes iam:CreateUser, iam:CreateAccessKey, and sts:AssumeRole to external accounts caps the lateral movement potential of any compromised function, including one whose role drifted after the last security review.

On AWS, Service Control Policies (SCPs) at the organizational unit (OU) or account level define the hard ceiling for any principal in scope, and permission boundaries define the ceiling for individual roles. These solve different problems. The SCP is your account-wide guardrail: deny destructive IAM actions and cross-account role assumptions for every principal in scope, no exceptions. The permission boundary is a standardized ceiling applied uniformly across Lambda execution roles, blocking the worst-case actions (IAM writes, external STS assumptions, service categories the function category has no business touching) regardless of what policies get attached later. The boundary is not a per-function artifact tuned to each Lambda's exact needs; that work lives in the role's actual policy. The boundary's job is to ensure that even a drifted or misconfigured role cannot exceed a defined worst-case scope.

On Azure, Azure Policy with a Deny effect at the management group or subscription scope is the practical equivalent. Apply it to block managed identity role assignments above resource-group scope. This directly prevents the Contributor-at-subscription pattern that quickstart documentation demonstrates and that quickstart documentation promotes without flagging the security implications.

On GCP, Organization Policy constraints enforce ceilings at the org, folder, or project level. constraints/iam.disableServiceAccountKeyCreation prevents a compromised service account from generating persistent long-lived keys for itself. constraints/iam.allowedPolicyMemberDomains restricts which external identities can be granted IAM bindings in your project, closing the path where a compromised function grants permissions to an attacker-controlled account.

If you've read the agent authorization post, this ceiling pattern is the same principle as Cedar's forbid override in Amazon Verified Permissions (AVP). The mechanism is different but the architecture is identical, a control that holds regardless of what any downstream policy permits.

Source Binding: A Control Gap Worth Knowing

There is a class of control worth naming separately, because it's narrower than a permission boundary and more precise. It binds a credential so that it can only be used from within the originating service.

On AWS, this is lambda:SourceFunctionArn. When a Lambda function makes an AWS API call using its execution role, the Lambda service injects this context key into the request. It is set by the platform, not the caller. An attacker who exfiltrates the STS credentials and calls the same API from their own machine doesn't have Lambda making the call. The context key is absent, the condition fails, and the request is denied. The credential is valid. The call is blocked.

{
  "Effect": "Allow",
  "Action": ["s3:GetObject"],
  "Resource": "arn:aws:s3:::my-bucket/*",
  "Condition": {
    "ArnEquals": {
      "lambda:SourceFunctionArn": "arn:aws:lambda:us-east-1:ACCOUNT:function:my-function"
    }
  }
}

Source binding: why exfiltrated credentials don't work

Credentials stolen

API call made

Context key absent

Request denied

For functions handling sensitive data, it should be on every permission statement. It can also be enforced via a permission boundary, so you can require it across all Lambda execution roles without updating each policy individually.

Azure and GCP don't have a direct equivalent. Microsoft Entra Conditional Access for workload identities is generally available for single-tenant service principals but does not cover managed identities, and requires a Workload ID Premium license. On GCP, VPC Service Controls can achieve the same practical outcome if the service runs in RFC 1918 space with a properly configured service perimeter. An exfiltrated token has no routable path to the GCP APIs from outside that boundary. The difference is the dependency: that guarantee lives in the network layer, not the identity layer, and breaks down if the perimeter is misconfigured or an attacker has a foothold inside the VPC. Compensating controls (tight permission scope, short TTL, network egress restrictions) reduce the window and the damage. They do not provide the same guarantee.

If your workloads run on Azure or GCP, the compensating controls above are the practical path. None of them require a platform change.

The Audit You Can Run Today

Start here. These commands enumerate your functions and their execution identities. Run them, look at what comes back, and ask yourself whether the attached policies reflect what those functions actually need.

AWS: list Lambda execution roles and flag FullAccess policies:

# List all functions and their execution roles
aws lambda list-functions \
  --query 'Functions[*].[FunctionName,Role]' \
  --output table

# Find roles with FullAccess managed policies attached
aws lambda list-functions --output json \
  | jq -r '.Functions[].Role | split("/") | last' \
  | sort -u \
  | xargs -I{} aws iam list-attached-role-policies --role-name {} \
  --query 'AttachedPolicies[?contains(PolicyName,`FullAccess`)].PolicyName' \
  --output text

Azure: list Function Apps and their managed identity role assignments:

az functionapp list \
  --query '[*].{Name:name,Identity:identity.type,RG:resourceGroup}' \
  -o table

az role assignment list --assignee <principal-id> --output table

GCP: list Cloud Functions and their service accounts:

gcloud functions list \
  --format='table(name,serviceAccountEmail)'

gcloud projects get-iam-policy PROJECT_ID \
  --flatten="bindings[].members" \
  --filter="bindings.members:serviceAccount:SA_EMAIL" \
  --format="table(bindings.role)"

Four things to do with what you find:

Any role with a FullAccess managed policy or Action: "*" in a custom policy is a priority. Start with those.
Enable CloudTrail data events (AWS), Azure Diagnostic Logs, or GCP Data Access audit logs if they aren't already on. You need 30 to 90 days of logs before you can generate an accurate least-privilege policy.
Apply a ceiling. On AWS, define a standardized permission boundary applied uniformly across Lambda execution roles. One boundary that blocks worst-case actions (IAM writes, external STS assumptions) regardless of what policies get attached later, paired with an SCP that denies destructive IAM actions at the account level. On Azure, add an Azure Policy that blocks Contributor-or-above role assignments for managed identities. On GCP, enable org policy constraints that restrict key creation and external IAM grants.
For functions handling sensitive data, add a source binding condition to every permission statement. On AWS, use lambda:SourceFunctionArn on each policy statement. It makes the execution role credentials useless if exfiltrated outside the originating function. On Azure, managed identities are not eligible for Conditional Access policies; apply tight permission scope, short token TTL, and network egress restrictions to the function's VNet as your compensating controls. On GCP, configure a VPC Service Controls perimeter around the function's service account if the service runs in RFC 1918 space. An exfiltrated token has no routable path to GCP APIs from outside that boundary.

The point isn't to have done all of this by tomorrow. The point is to know what your execution identities can reach, because that's what an attacker is looking at when they find a vulnerability in your function. You removed the servers. The blast radius is still there.

The same identity-as-blast-radius pattern applies to AI agents and MCP servers running on developer workstations. Those tools inherit ambient credentials the same way a Lambda function does. That's the subject of the next series, starting June 4.