29 Jul 2022

KEMs and Post-Quantum age

They’re here! NIST selected a first batch of post-quantum cryptographic key exchange and signature algorithms. The report is a nice read that explains a lot of the goals, candidates, selections, and rationales. I recommend Sections 2, 3.3, and 4.1.

For key exchange, NIST selected only CRYSTALS-Kyber, a KEM with IND-CCA2 security based on structured lattices, a successor of NewHope, with 800 bytes keys and ciphertexts (although the authors recommend using the category 3 parameters, with 1184 bytes public keys and 1088 bytes ciphertexts). Four other KEMs based on codes and isogenies are continuing to a fourth round that will select a key exchange fallback in case lattices turn out to be a bad idea.¹

KEMs

What’s a KEM and what does it mean for it to have IND-CCA2 security? A Key Encapsulation Method is an implementation of the following API.

KeyGen() -> public key, secret key
Encapsulate(public key) -> ciphertext, shared key
Decapsulate(secret key, ciphertext) -> shared key

One way to implement a KEM that you might already be familiar with is with Elliptic Curve Diffie-Hellman. In ECDH, the ciphertext is more commonly called the peer or ephemeral share.

KeyGen():
    secret key = random_scalar()
    public key = scalar_mult(secret key, generator)

Encapsulate(public key):
    ephemeral = random_scalar()
    ciphertext = scalar_mult(ephemeral, generator)
    s = scalar_mult(ephemeral, public key)
    shared key = KDF(s || public key || ciphertext)

Decapsulate(secret key, ciphertext):
    s = scalar_mult(secret key, ciphertext)
    shared key = KDF(s || public key || ciphertext)

One can also make a KEM from RSA. It’s actually a much simpler, safer, and easier-to-prove way to do RSA encryption, compared with RSA-OAEP and especially with with RSA PKCS #1 v1.5 encryption. I regret using RSA-OAEP instead of RSA-KEM for ssh-rsa support in age.

KeyGen():
    p, q = random_prime(), random_prime()
    public key = n = p * q
    e = 65537
    secret key = d = e⁻¹ mod λ(n)

Encapsulate(n):
    m = random(2, n)
    ciphertext = m ^ e mod n
    shared key = KDF(m || n || ciphertext)

Decapsulate(d, ciphertext):
    m = ciphertext ^ d mod n
    shared key = KDF(m || n || ciphertext)

(In the wild, you’ll find RSA-KEM with configurable public exponent e and without n and c hashed into the shared key. Good cryptographic engineering hygiene requires fixing parameters, and hashing transcripts. The former saved my homebrew youtube-dl update verification function, and the latter makes the KEM contributory.)

So that’s what a KEM is.² What about IND-CCA2 security? It means that it provides “ciphertext indistinguishability against adaptive chosen ciphertext attacks”. More concretely, it means you can reuse a public key instead of having to generate a new one for every decapsulation. The alternative is for it to be secure only against chosen plaintext attacks (CPA), where the attacker doesn’t have access to a decryption oracle. In practice, that would require using each public key only once, restricting its use mostly to interactive protocols.

A CCA-secure KEM is a very versatile tool, that lets us do many (although not all) things we’re used to doing in the pre-quantum world. The main difference from (EC)DH is that KEMs are asymmetric: you can’t use the public key as a ciphertext and vice-versa.

Here’s what can be done with a CCA-secure KEM:

It can be combined with symmetric encryption in an offline KEM-DEM public key encryption scheme, like ECDHE turns into ECIES (which is what age’s X25519 recipients implement): the sender does an encapsulation to the recipient’s public key, uses the shared key to encrypt a message with a symmetric AEAD, and sends the KEM ciphertext and the encrypted message.⁵
It can be used to build authenticated key exchanges by running the KEM three times (or two, to authenticate only one side) in parallel: once with an ephemeral public key for forward secrecy, once with one peer’s long term public key, and once with the other peer’s long term public key. The three shared keys are hashed together, and using the resulting key authenticates the two peers.⁴
It allows caching and reusing public keys in ephemeral key exchanges like ECDHE in TLS (which you might or might not want to do—with ECDH we pretty much collectively gave up on it, because many many many implementation errors become unexploitable if the “public key” is used only once³).

Kyber

Kyber is designed as a CPA-secure PKE core, turned into a CCA-secure KEM with a standard construction, but only the CCA-secure KEM is specified as an exposed primitive. This is cheap—it just adds a handful of hash invocations⁶ and a PKE encapsulation to the KEM decapsulation step—and a very good idea: we don’t need two very similarly named primitives with very different security properties floating around, when we could just have the safer one everywhere. I hope implementations respect the spirit of the design and don’t expose the CPA-secure PKE.

Another robustness detail I appreciate of Kyber is that it hashes the transcript (ciphertext and public key) into the key by design.⁸ When making a KEM out of ECDH, you need to feed shared secret, public key, and peer share/ciphertext into the KDF to get a contributory KEM. Kyber does that for you. While not required for CCA-security, contributory behavior can prevent some neat real-world attacks and should even allow using the public key as an authentication key.⁷ Enshrining this property into the design and test vectors is the right way to ensure implementations don’t forget it when they need it.

Post-quantum age

age is a simple, modern, and secure file encryption format, tool, and library. Its native recipient type is based on X25519, but it supports custom recipient types through Go interfaces or an stdin/stdout plugin system. All a recipient needs to do is wrap and unwrap a file key given a recipient and identity string, respectively. Can we make a post-quantum age recipient based on Kyber?

128 bits are enough

The main concern is the size of the file key: 128 bits.¹⁰ My previous rough understanding of Grover’s attack was that it allowed searching an n-bit key space for a black box function in n/2 “time” on a quantum computer. This understanding is fairly popular: it’s sort of a meme that you need to support AES-256 for “post-quantum reasons”. If that’s right, then no matter what we use for the recipient implementation, age v1 can be attacked with a quantum computer by bruteforcing the file key (and checking it correctness against the header HMAC) in 2⁶⁴ time, which sounds unacceptable.

That understanding is practically wrong.

The NIST security evaluation criteria for post-quantum cryptography define five security categories of increasing strength. Each proposed post-quantum scheme provides a set of parameters to match these categories, which you can think about like key sizes in classical cryptography. To meet category 1 requirements, attacking a set of parameters “must require computational resources comparable to or greater than those required for key search on a block cipher with a 128-bit key (e.g. AES128)”.

Not only is a 128-bit key not a deal-breaker, then, but it’s the benchmark against which category 1 post-quantum parameters are measured against. How to reconcile this with our understanding of Grover? NIST explains:

[…] NIST suggests an approach where quantum attacks are restricted to a fixed running time, or circuit depth. Call this parameter MAXDEPTH. This restriction is motivated by the difficulty of running extremely long serial computations. Plausible values for MAXDEPTH range from 2⁴⁰ logical gates (the approximate number of gates that presently envisioned quantum computing architectures are expected to serially perform in a year) through 2⁶⁴ logical gates (the approximate number of gates that current classical computing architectures can perform serially in a decade), to no more than 2⁹⁶ logical gates (the approximate number of gates that atomic scale qubits with speed of light propagation times could perform in a millennium).

The complexity of quantum attacks can then be measured in terms of circuit size. These numbers can be compared to the resources required to break AES and SHA3. At the present time, NIST would give the following estimates for the classical and quantum gate counts for the optimal key recovery and collision attacks on AES and SHA3, respectively, where circuit depth is limited to MAXDEPTH:

AES-128: 2¹⁷⁰ / MAXDEPTH quantum gates […]

It is worth noting that the security categories based on these reference primitives provide substantially more quantum security than a naïve analysis might suggest. For example, categories 1, 3 and 5 are defined in terms of block ciphers, which can be broken using Grover’s algorithm, with a quadratic quantum speedup. But Grover’s algorithm requires a long-running serial computation, which is difficult to implement in practice. In a realistic attack, one has to run many smaller instances of the algorithm in parallel, which makes the quantum speedup less dramatic.

The summary is that Grover requires an unrealistically long-running serial computation⁹, and can’t be parallelized better than naively running multiple instances. When taking into account a conservative maximum running time, such as 2⁶⁴ gates, running Grover on a 128-bit key space would require a circuit size of 2¹⁰⁶.

Assuming that there aren’t any better-than-Grover attacks against the age primitives—HKDF-SHA-256 followed by HMAC-SHA-256 or ChaCha20Poly1305—it’s fully appropriate to match category 1 post-quantum KEMs with 128-bit symmetric keys, like in age.

Yes, it's fine to use Kyber512 like this. But I recommend defaulting to Kyber768 unless you have a hard performance constraint that forces you to use 512.
— John Schanck (@susurrusus) July 6, 2022

Indeed If 128-bit brute force is the best way of attacking the protocol, then it meets Level 1 requirements; being equivalent to AES-128.
( Most L5 PQC constructions actually have exactly 256-bit secrets inside them -- for XOF seed expanders, KDFs etc. There's no margin. )
— mjos\dwez (@mjos_crypto) July 6, 2022

The age Kyber768+X25519 plugin

Having established that the overall age protocol can by definition meet NIST’s category 1 post-quantum security requirements, we can start thinking about how a post-quantum age plugin would look like.

Kyber’s category 1 parameters are Kyber512, but the authors recommend using Kyber768 unless prohibitive due to performance reasons. Both are plenty fast, and both have large public keys (800 vs 1184 bytes), so let’s pick the conservative option. As str4d—rage’s author—points out, this would also let us decide later that we do instead want to increase the file key size, while letting users keep their keys with a matching comfortable margin.

Any deployment of experimental post-quantum cryptography should be hybrid: paired with classical cryptography such that even if lattice cryptography turns out to be a bad idea, the combined system will be at least as secure as the classical algorithm. For our plugin, we’ll just combine the Kyber768 KEM with X25519.

Native X25519 recipients wrap the file key using ChaCha20Poly1305 with a key derived as HKDF-SHA-256(ikm = shared secret, salt = ephemeral share || recipient, info = "age-encryption.org/v1/X25519"). We could just add the Kyber key to the input key material, but we can take the occasion to fix a small regret in the age design: the salt is supposed to be random or fixed for domain separation, while ephemeral share and recipient should have been part of the input key material. This is not a security issue, but it makes applying some proofs more convoluted.

The wrapping key for the Kyber768+X25519 plugin can then be HKDF-SHA-256(ikm = shared secret || ephemeral share || recipient || kyber key, salt = "age-encryption.org/Kyber768+X25519", info = nil).

There are a few open questions that relate to optional properties that are not part of the core age guarantees but some recipients provide:

Do a decryption or encryption oracle leak information about the public key? As we talked about above,⁷ the public key is hashed into the key (the KEM is contributory), so an attacker can’t generate a valid ciphertext for an unknown public key. If the attacker also can’t learn the public key from an oracle, it might be possible to build authentication semantics around this.
Are ciphertexts for the same public key efficiently linkable? X25519 and Scrypt ciphertexts are unlikable, so given two age files you can’t tell if they are encrypted to the same recipient. Does the same hold for Kyber768+X25519? (This does not hold for the SSH recipients, which intentionally include a public key hash, so we don’t ask for the passphrase for keys that wouldn’t work.)
Is it possible to craft ciphertexts that are valid (and lead to known keys) under multiple public keys, enabling multi-key attacks like against X25519 and ChaCha20Poly1305? Although the impact of this is limited—allows efficient searches of the accepted public keys of a decryption oracle—it might be a good reason to use a robust AEAD instead of ChaCha20Poly1305.

Finally, there’s the elephant in the room: the UX issue of the large public keys.

One of the key strengths (pun not intended) of age is that it has small, copy-pastable recipients like age1ydzqkt4vfuumwgk5hxus22gakwpqk9knua62qnzcqw5nzta30d9q3va9x0. A Kyber768+X25519 recipient would be… quite a bit longer. I am not going to paste one here because I care about the delivery rate of this email, but it’d be 1960 characters, more than 30 times longer than the X25519 one.

A suggestion I like was to have the plugin store keys with a -learn flag, and then allow using them with a short identifier, like a hash. Or, maybe we bring back the aliases file that was in the original age design document but never made it to v1.0.0. Or, we just accept that Kyber768+X25519 recipients will be mostly used with --recipients-file. It is still shorter than an armored OpenPGP public key. I welcome opinions and ideas.

Does age currently use any local state? You could do like a mapping from H(public key) -> public key locally, then just require registering a public key once.

But maybe you have a better idea.
— Lúcás Meier (@cronokirby) July 6, 2022

One more bonus UX issue: how do we stop people from mixing Kyber768+X25519 and plain X25519 recipients, defeating the post-quantum security of the file? Ideally without a special case.

If you’ve made it this far, you might enjoy subscribing to Cryptography Dispatches or following me on Twitter.

Bonus picture

I’m back on the lake for the Italian math team training (where the promising kids train and I mostly play board games)!

A picture of a lake taken from the shore. The sky is full of clouds colored orange from a recent storm. They reflect on the water, where a few boats sit anchored. Two wooden piers extend from the shore from right to left. In the distance mountains on the other side of the lake, and a small sliver of blue sky.

For signatures, NIST selected CRYSTALS-Dilithium, FALCON, and SPHINCS+. CRYSTALS-Dilithium is based on structured lattices and has 1312 bytes keys and 2420 bytes signatures (at category 2, see below). FALCON is also based on structured lattices, has significantly more complex implementation (requiring constant time floating point operations, which we collectively don’t have experience with), but provides 897 bytes public keys and 666 bytes signatures (at category 1). SPHINCS+ is a stateless hash-based signature scheme, with compact 32 bytes public keys but very large signatures starting at 7856 bytes. CRYSTALS-Dilithium is the main recommendation, FALCON is there for when you don’t have 2KiB for a signature (which, regrettably, probably means we’ll have to use the complex option in X.509), and SPHINCS+ is the fallback in case lattices turn out to be a bad idea (or you don’t care at all about signature size, like apparently firmware updates don’t, since the signature is discarded after verifying it and it’s only the public key taking up precious flash memory). This leaves a gap for non-lattice small signatures, for which NIST is planning to run a new Call for Proposal over the coming years. No other current candidates are being considered anymore. ↩
If this explanation of what a KEM is didn’t work for you, I liked this Stack Exchange answer which maps out the relationship between KEMs, public key encryption, and key exchanges. ↩
It’s interesting to think about ECDH implementation vulnerabilities, such as our old attack against a carry bug in Go and many others, as degrading security of the scheme from CCA to CPA. In that sense, making a CPA-secure ECDH implementing is actually not that hard. I can’t think of any implementation vulnerabilities, including side channel attacks, that violated CPA security. This makes intuitive sense, since in a CPA/ephemeral scenario the attacker only gets one shot at extracting information about the key, which usually comes a few bits at a time. I wonder if this will be true of post-quantum KEMs, too. Maybe we should generally try to design protocols that only require CPA security even when a CCA-secure tool is available, as a mitigation for implementation issues. ↩
See Section 5 of the original Kyber paper for a fully worked out sketch of this key exchange protocol. ↩
See Appendix A of the Kyber paper for a sketch of a KEM-DEM public key encryption scheme. ↩
Kyber is so fast that the hashes take almost ¾ of the Encaps and Decaps CPU cycles. Kyber also made some slightly bizarre hash choices, using different SHA-3 variants for domain separation. This is a small thing that I hope will change in the final standard, as I’m not a fan of using four different hash variants in one scheme, I’d rather see SHAKE used everywhere and the fixed size SHA-3 variants forgotten, and domain separation is best understood in the context of a whole protocol, where prefixes (or HMAC keys, or context-aware hashes) are the standard. You run out even of SHA-3 variants, eventually. ↩
The idea being that without the public key it might be impossible for the attacker to forge a ciphertext for that public key that will decrypt to a known key, making the public key a valid authentication capability key. It’s easy to show that knowledge of the public key is necessary to produce a ciphertext that decrypts to a known key, since the public key is hashed into the key; what’s harder is showing that a CCA decryption oracle doesn’t leak information about the public key. We’ll return to this in the context of adding authentication to age. ↩↩
See Section 4 of the Kyber paper, subheading “Hashing pk into ˆK”. ↩
This nuance comes up in evaluating classical attacks, too, by the way. There was some commotion about attacks against static Diffie-Hellman oracles a few years ago, claiming it reduced the security of Privacy Pass over Curve25519 or ristretto255, but a careful analysis showed that to get a worrying improvement it required almost 2⁶⁴ sequential oracle queries, which even at one query per nanosecond would take 400 years. For a parallelizable attack, 2⁶⁴ is almost trivial, but as sequential operations where the input of the next requires the previous one, it’s a prohibitive scale. ↩
Why 128 bits? Because the file key is wrapped in each recipient stanza, so adding 16 more bytes to the file key would add 16 bytes to the file size per recipient. Instead, we have a 128-bit per-file nonce to provide a comfortable margin against multi-user attacks, where a shared search space of only 128 bits would be too tight. ↩