Project Zero dropped a great bug in Vault which I think would have been prevented by one of the lessons learned of cryptography engineering: when you can, always prefer reconstructing a value rather than parsing and validating it.

You should read the blog post to understand the attack first, because my tl;dr will not do it justice, but here's an overview.

Vault is a thing that manages your secrets, like database credentials, and makes them accessible to the applications that need them through its various APIs. Of course, these APIs need some sort of authentication, which can be a bit of a chicken-and-egg situation. If you run on a cloud platform like AWS, the natural way to identify an application is through the IAM role it runs as, and Vault has a way to authenticate API calls through IAM roles.

How does Vault do that? Is there an AWS API for that? Nope!

How you authenticate AWS API calls is by making an HMAC signature on a canonicalized[1] version of the HTTP request. This means you can hand a signed request to someone else and they can run it for you. (This is used for example to let clients download specific objects from S3 without sharing credentials.)

Moreover, AWS has an innocuous API called sts:GetCallerIdentity which just returns the IAM role of the caller. You might see where this is going.

How Vault IAM authentication works is that you prepare a signed sts:GetCallerIdentity API request and give it to Vault, which sends it to AWS and gets the IAM role name from the response.

Felix Wilhelm from Project Zero found that an attacker can change the API request to hit a different AWS API which will reflect back attacker controlled data (wrapped in JSON) without needing proper authentication, and put a sts:GetCallerIdentity response in the attacker controlled data.

Again, if this doesn't make sense, read the blog post.

Multiple things went wrong here:

  1. The scheme is too clever by three quarters, using an AWS API which is not meant for authentication, and is actually considered so innocuous it can't be blocked.

No permissions are required to perform this operation. If an administrator adds a policy to your IAM user or role that explicitly denies access to the sts:GetCallerIdentity action, you can still perform this operation. Permissions are not required because the same information is returned when an IAM user or role is denied access.

  1. The request that Vault forwards to AWS is malleable and Vault failed to validate it sufficiently, allowing the attacker to redirect it to a different endpoint.

  2. The XML parser is too lenient in allowing text before and after the element it's unmarshaling, letting the attacker embed a fake GetCallerIdentityResponse in a JSON blob.

(1) is where things actually started taking a turn for the worse, but it's arguably AWS's fault for not providing a proper delegated auth scheme. It's also unexpected that delegating a sts:GetCallerIdentity request would result in delegating Vault access, but Vault apparently has a mitigation for that: you can require a custom X-Vault-AWS-IAM-Server-ID header in the request, which will be verified by AWS with the rest of the request. This is also clever[2], but for some reason optional.

(3) is a little too much about my day job to be fun to discuss here, since the XML parser in question is Go's encoding/xml.[3]

What I want to focus on is (2), because it's a lesson we learned the hard way in cryptography and didn't transfer effectively to the rest of security engineering.

One of my favorite cryptographic attacks is the Bleichenbacher'06 signature forgery. I wrote up how it works when I found it in python-rsa, so again go read that, but here's a tl;dr. When you verify an RSA PKCS#1 v1.5 signature, you get a ASN.1 DER structure wrapping the message hash that you need to check. If you don't parse it strictly, for example by allowing extra fields or trailing bytes, an attacker can fake the signature. This was exploited countless times.

The lesson we learned was that instead of parsing the ASN.1 DER to extract the message hash, we should reconstruct the ASN.1 DER we'd expect to see, and then simply compare it byte-by-byte.

The same technique would have saved Vault.

This is what a sts:GetCallerIdentity request looks like:

POST / HTTP/1.1
Host: sts.amazonaws.com
Accept-Encoding: identity
Content-Length: 32
Content-Type: application/x-www-form-urlencoded
Authorization: AWS4-HMAC-SHA256 Credential=AKIAI44QH8DHBEXAMPLE/20160126/us-east-1/sts/aws4_request,SignedHeaders=host;user-agent;x-amz-date,Signature=1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
X-Amz-Date: 20160126T215751Z
User-Agent: aws-cli/1.10.0 Python/2.7.3 Linux/3.13.0-76-generic botocore/1.3.22

Action=GetCallerIdentity&Version=2011-06-15

The Vault API takes basically the entire thing split over four base64-encoded parameters: iam_http_request_method, iam_request_url, iam_request_headers, and iam_request_body. The attackers modified the body and headers to redirect the request and get a spoofed response. There is no reason to give an attacker all this flexibility!

If you look at the sts:GetCallerIdentity request above there are really only two fields that should ever change: X-Amz-Date and Authorization. A much better API for Vault would only take two parameters: iam_request_date and iam_request_authorization. Everything else (including the clever X-Vault-AWS-IAM-Server-ID header) would be hardcoded both on the Vault client and on the server.

POST / HTTP/1.1
Host: sts.amazonaws.com
Accept-Encoding: identity
Content-Length: 32
Content-Type: application/x-www-form-urlencoded
Authorization: $iam_request_authorization
X-Amz-Date: $iam_request_date
X-Vault-AWS-IAM-Server-ID: vault.example.com
User-Agent: Vault-IAM-Authentication/1.0.0

Action=GetCallerIdentity&Version=2011-06-15

The client would reconstruct the request to generate the Authorization signature, and the server would reconstruct it to send it on to AWS. The server would still have to validate or at least escape the two X-Amz-Date and Authorization header values, but that's way easier than validating a whole request. If an attacker changed anything else in the request, the Authorization signature would simply fail to verify. Since the attackers would not have had control over the body or the headers, the attack would have never worked. I bet it would have even been less code!

The best way to validate something that's not supposed to change is to not accept it from the attacker at all, and reconstruct it from hardcoded values. The parameters that you do accept should be as tightly scoped as possible, to make them easier to validate without risky parsing.

One might ask what happens when the AWS API changes or grows a parameter, but that's an argument for this technique, not against. Any changes might invalidate the security of this scheme, especially since it's not what the sts:GetCallerIdentity API is for, and you don't want the client to be able to opt-in unilaterally. Indeed, the patch adds a header allowlist, which neuters the flexibility of taking the whole request from the client anyway, leaving behind just the complexity of doing unnecessary and dangerous parsing and validation.

A picture

This newsletter is brought to you by Rome being pretty as hell. Bonus points to anyone who can recognize the garden. (I stripped EXIF metadata, sorry. 😉)

The sun shining through the trees in a garden.


  1. Relatedly but orthogonally, canonicalization is just a way to introduce a dangerous parse and serialize cycle in the danger path, and as much as possible signatures should be over what's on the wire. But that's an opinion for another day. ↩︎

  2. As my team can tell you, I don't consider "clever" a compliment in my line of work. ↩︎

  3. Look, I'm not paid to write these. (I am paid to learn XML and despair about the backwards compatibility promise, though, so I assure you I am doing that.) ↩︎