6 Jul 2023

I want XAES-256-GCM/11

In 2023, the way to use AES is AES-GCM. Anything else is very unlikely to make sense. We might not like that, we might wish OCB hadn’t been patented, but with hardware support in most processors these days GCM is both faster than the alternatives, ubiquitous, and just tolerable to implement.

Still, I don’t want to use AES-GCM, I want to use XAES-256-GCM/11, which has a number of nice properties and only the annoying defect of not existing.

The problem with AES-GCM is that its nonce is a little too small to comfortably select at random. At 96 bit, the probability of a birthday bound collision becomes uncomfortable (2^-32^) after a few billion messages (2^32^), and the consequences of a collision are catastrophic.¹ Some applications know they will never encrypt that many messages under a single key, but those numbers are a little too low and the consequences too dire to make that assumption at the API level, so we’re forced to delegate nonce management to the user.⁶

ChaCha20Poly1305 has the same issue, which is why the extended-nonce construction XChaCha20Poly1305 exists. It resolves the issue by taking a 192-bit nonce², and “hashing” it along with the key into a fresh key.⁵ The key is 256 bits, so there is no need to ever worry about collisions there. (More on that later.) You should always use the X variant unless the nonce is fully implicit, like when it’s always zero (like in age), a record counter (like in TLS), or a chunk number (like in STREAM). You can also make nice APIs for the X variant that generate the nonce at random from the system CSPRNG (and we are thinking about how to best do that in the Go standard library).

XChaCha20Poly1305 uses ChaCha20 itself to do the hashing to avoid introducing a new primitive, but this would work just as well using SHA-256 or HKDF to derive a regular key and nonce for ChaCha20Poly1305 from a key and a larger nonce. This means you could do the same for AES-GCM, and some schemes do in fact do just that. Regrettably, we don’t have a name for it, so you can’t just say to people “you should always use the X variant” or make a nice interoperable API for it.

Anyway, that’s the X in my ~~pony~~ XAES-256-GCM/11.

I don’t care if it uses HKDF, or SHA-256, or the AES function³: I want a well-defined scheme that takes a key and 192 bits of nonce and hashes them into a derived key and 96 bits of nonce for use with AES-GCM.

Even with AES-128, the combined 224 bits of space for key and nonce are juuuust enough not to worry about collisions in the derived values. Still, this hints to the second part of the ~~pony~~ problem with AES-GCM: while actually enough for post-quantum cryptography, 128-bit keys are still too tight for multi-user security. If for example an application encrypts 2^48^ messages under different 128-bit keys, and all messages start with the same few bytes, an attacker can build a lookup table, try and lookup the ciphertext of 2^64^ keys, and have a 2^-16^ chance to decrypt one message. Not good.

Multi-user attacks can be mitigated by many things—they require fixed nonces and partially known plaintext—but the only way to avoid that complexity escaping the AEAD abstraction and leaking into the rest of the protocol is to use 256-bit keys. For AES-GCM, that means using AES-256. Too bad that AES-256 is slower than AES-128, not because of its bigger key size, but because it was specified with more rounds, presumably under the assumption that users of bigger keys would also want more security margin against cryptanalysis. AES is an iterated cipher, where four core operations are performed multiple times in sequence: AES-128 performs them ten times, while AES-256 performs them fourteen times. AES-256 doesn’t have to be slower than AES-128, it was just defined to be slower. That’s regrettable, because it applies an artificial performance tax on the use of longer keys, which are desirable for reasons that have nothing to do with the risk of AES cryptanalysis.

Picking the “right” number of rounds is hard, and it’s kind of an open secret in the community that there is no rigorous and scientific way to do it. “Too much crypto” by Aumasson is all about this, and suggests eleven rounds for AES-256. I started this aiming for ten, which would have been exactly as fast as AES-128, but AES-256 does need more margin than AES-128 in some rare cases for reasons⁴ and I guess it’d be silly to introduce a different abstraction-leaking edge case while trying to remove two, so AES-256/11 it is.

Summing up, I would like to provide to my users the extended-nonce 256-bit reduced-rounds XAES-256-GCM/11 (or XAES-256/11-GCM?) AEAD. It has infinitely randomizable nonces, a comfortable margin of multi-user security, and nearly the same performance as AES-128-GCM. It would also be a true drop-in replacement for XChaCha20Poly1305. Only issue is that it doesn’t exist.

(I guess it would also not be FIPS 140 compliant. We could make a FIPS version that’s slower but equivalent by using HKDF to derive the key, and the full-rounds AES-256.)

Edit 2023-07-06: Soatok has previously suggested an extended-nonce AES-GCM construction. His AES-XGCM uses CBC-MAC for key derivation, which might be a good pick as both primitive-parsimonious and potentially FIPS 140 compliant. @NohatCoder@mastodon.gamedev.place on Mastodon suggests that since we’re KDF’ing, we might as well generate all the subkeys directly, solving the AES-256-specific issues that require the extra rounds (see ⁴). I’m a bit wary to diverge too much from the well-established primitives, since at that point might as well use something novel (such as AEGIS, like Soatok says). The point here is that I’d like a thin construction that’s easy to get confidence for if you already trust AES-GCM.

If for some reason you’re curious about what other ponies I want, you might want to follow me on Bluesky or on Mastodon.

The picture

Cats! Amongst Roman ruins! What else do you need in life? (Better AEAD modes I guess.)

A calico cat sleeps in the foreground, another cat is sitting down a step behind it, and behind them and a rail is a archaeological site with trees and columns. Further back, buildings.

My awesome clients—Sigsum, Protocol Labs, Latacora, Interchain, Smallstep, and Tailscale—are funding all this work and through our retainer contracts they get face time about its direction, as well as unlimited access to advice.

Here are a few words from some of them!

Protocol Labs — Cryptonet is hosting Proof of Space days in Paris on July 20-21, a gathering of cryptographers, Web3 researchers and engineers to share knowledge on Proof of Space. We’ll have talks and workshops to collaborate, share ideas and onboard new researchers into this exciting field. You’ll also have a chance to meet us (we’re currently looking for a senior cryptography engineer), register and join us!

Latacora — Latacora bootstraps security practices for startups. Instead of wasting your time trying to hire a security person who is good at everything from Android security to AWS IAM strategies to SOC2 and apparently has the time to answer all your security questionnaires plus never gets sick or takes a day off, you hire us. We provide a crack team of professionals prepped with processes and power tools, coupling individual security capabilities with strategic program management and tactical project management.

You’ll sometimes hear that any single-key AES mode can only produce 2^48^ blocks before the advantage of a distinguishing attacker gets uncomfortable. That would seem contradictory with a limit of 2^32^ messages for AES-GCM, since each message can be up to 2^32^ blocks. The thing is that the only thing that an attacker can do by observing more than 2^48^ blocks is confirm that they’re looking at AES rather than at complete randomness. A completely random stream would be expected to show some output block collisions after that many blocks, while AES, being an invertible permutation applied on a counter and unique nonce, will never repeat. The lack of collisions is the distinguisher. NIST doesn’t care, and neither do I. [Edited 2024-07-15] Turns out I was wrong about this! There is indeed a 2^48^ blocks bound with practical attacks beyond a weak distinguisher. It’s actually a fascinating attack. NIST is aware of this mistake in the standards. ↩
Technically, 128 bits of nonce are hashed with 256 bits of key into 256 bits of key, and 64 bits of nonce are used as nonce, which leaves 32 bits of nonce space that can be reassigned to counter, if you need to encrypt more than 256 GB. In practice, we don’t support that. What counts as key, nonce, and counter in ChaCha20 is really just arbitrary: 384 bits of variable state are mixed with 128 bits of constants for each block, and you need to make sure you never reuse the same state. The original ChaCha20 had 64 bits of nonce and 64 bits of counter, the IETF moved 32 bits from nonce to counter, making random nonces juuust tempting but not quite safe. Anyway. ↩
Although if we’re doing this, we might as well do it cleanly and use AES like XChaCha20Poly1305 or GHASH like AES-GCM-SIV. ↩
Interestingly enough, AES-256 turned out to actually have a legitimate (if maybe unexpected) need for more rounds, because the 256-bit key schedule (the way the key is expanded into subkeys for the various rounds) is simpler and more vulnerable to related-key attacks. In 2009/2010 we got some impractical related-key attacks against AES-256/14, and some “practical” related-key attacks against AES-256/9. The paper does a good job of explaining the background in the first few sections. The related-key attack setting is very contrived: an attacker can recover a key K if it’s able to get an encryption oracle for a K and for K⊕C. There is apparently one protocol that somehow managed that, but these days I’d hope all keys are the output of a CSPRNG or a KDF, not some haphazard XOR operation. (There was also an attack against AES-256/11 which required even more implausible relations between the keys.) I also suspect that the key derivation step of the extended-nonce construction we discussed above would rule out these related-key attacks. Anyway. ↩↩
In a sense XChaCha20Poly1305 is a SIV construction in the same way AES-GCM-SIV is: they both compute a synthetic (key, nonce) rather than a synthetic nonce alone. 🤔 However, the message is not one of the inputs in the XChaCha20Poly1305 derivation, so it’s not misuse resistant. There are some interesting parallels here with signature nonce generation, where the best design turns out to be hashing message and randomness into it. ↩
Technically, GCM accepts longer nonces. However, nonces longer than 96 bits are hashed into a starting 128-bit counter value, leaving no dedicated counter space. (96-bit nonces are instead concatenated with a 32-bit counter.) The birthday bound of 128 bits is better, allowing us to encrypt in theory 2^48^ messages (the approximate formula is 2^(N-32)/2^ messages for a 2^-32^ advantage, which you can derive from this rule of thumb), but that’s only if every message was precisely one block long. If messages are longer, there’s a risk the counter will overlap across messages, which is not as catastrophic and easy to detect as nonce reuse, but leads to loss of confidentiality. In general, it’s very inconvenient for bounds to depend on the message size, because that’s yet another assessment we have to delegate to the application. AES-GCM-SIV has the same issue: it has better bounds for shorter messages, but worse for messages longer than 8 GiB, and the answer to “how many messages can I encrypt” is a double entry table instead of a number. [Edited 2024-07-15] The last part about AES-GCM-SIV also didn’t take into account the practical birthday attacks on AES-CTR not reflected in the standard bounds. ↩