29 Sep 2022

age and Authenticated Encryption

age is a file encryption format, tool, and library. It was made to replace one of the last remaining GnuPG use cases, but it was not made to replace GnuPG because in the last 20 years we learned that cryptographic tools work best when they are specialized and opinionated instead of flexible Swiss Army knives. How it went is that you’d read The PGP Problem on the Latacora blog, get convinced that you should use the right tool for the job, scroll to the “Encrypting Files” section, and it’d say:

This really is a problem. If you’re not making a backup, and you’re not archiving something offline for long-term storage, and you’re not encrypting in order to securely send the file to someone else, and you’re not encrypting virtual drives that you mount/unmount as needed to get work done, then there’s no one good tool that does this now.

age was made to fill that gap. It’s an annoying gap to fill, because the job is much more underspecified than, say, virtual drives or even backups. People mean different things for files, and even for encryption. They want to encrypt with a passphrase, with a symmetric key, with an asymmetric key, with KMS, with a YubiKey… We had a whole session at the High-Assurance Cryptography Workshop titled something along the lines of “Encrypting files: what does it mean???”.

Eventually what we targeted with age was whatever people were using gpg -c (password encryption) or gpg -e (public key encryption) for. We assumed by the time you dropped down to age you knew what problem you were trying to solve, and wanted a tool that would spare you the pain and risk of dealing with a myriad of legacy cryptographic options you don’t care about. We made it a good UNIX tool, working on pipes, and put you in charge of key management, simplified down to copy-pastable short strings. We introduced plugins to support encrypting to whatever you want (YubiKeys, KMS, …).

One thing we decided is that we’d not include signing support. Signing introduces a whole dimension of complexity to the UX, to key management, and to the cryptographic design of a streamable seekable format. age is in the business of integrity and confidentiality, not authentication. This was hotly debated early on, and still is periodically.

In short, while no one can modify an encrypted age file, anyone can replace it with a completely new encrypted file. (This is not actually strictly true, as we’ll see later, but it’s what age says on the tin to be conservative.)

Still, I kept thinking about it, and I am now considering a small backwards-compatible tweak that would introduce explicit support for authentication (not signing!) in age, based on the unadvertised authentication properties that age already has. I’m looking for feedback, especially on the use cases (do you need it? do you not? what for? would this work for you?) so click the reply button or open a discussion on GitHub.

First, some background about what authentication is for, and what it is.

What doesn’t need authentication

There are a number of age use cases that don’t need age to take care of authentication. They mostly match what age is used for today, which I find fairly reassuring.

Storing secrets next to code

If you use something like SOPS or just check age secrets into a git repository next to source code, you need an authentication story for the whole repository. Having authentication for the secrets will do nothing if the attacker can change the source code that decrypts and uses them.

That story can simply be “we trust GitHub” like most projects. Encrypting secrets with age will keep them confidential even if the project is Open Source, and anyone wanting to replace them will have to make a PR even if they can generate a new valid age file.

Local password-store

I use passage, a password-store fork, with age-plugin-yubikey as my password manager. I don’t need it to authenticate the secrets it decrypts: again, if an attacker can tamper with the encrypted secrets they can change the code that my laptop will run, and do much worse.

(Actually, I also obviously have full-disk encryption enabled, so the only reason I use passage instead of a passwords.txt file¹ is that with age-plugin-yubikey each decryption requires a YubiKey physical touch. This way even if you compromise my laptop you can’t exfiltrate all my secrets at once.)

I honestly didn’t really get why you’d use passage or password-store at all without a hardware key, which is part of why age doesn’t have an agent yet, but I’ve sort of come around to the “secrets are synced to the cloud” scenario. That use case does need authentication if you’re worried about the attacker feeding you fake passwords.

Self-authenticating data

If a file is self-authenticating, for example because it starts with a secret token, the attacker can’t produce a new age file that starts with that secret without knowing it. This is because age uses AEADs and never produces unauthenticated output. “Wait, did you just say age is authenticated?” Yes, the term is overloaded and it’s confusing. We’ll talk more about the difference between authenticated-as-in-AEAD and authenticated-as-in-this-article later.

I know of companies in the payment processing space that use age like this. In fact, they probably got a security update by switching from unsigned PGP to age, because the mechanism that provides this kind of integrity in PGP, the “Modification Detection Code”, is/was optional, predates AEADs, and is sketchy at best.²

Encrypt-to-delete

Deleting bits from storage media is hard, especially in the era of SSDs and wear leveling. Instead, I often encrypt something with age, and just make sure the key never hits permanent storage. Key gone, data as good as deleted.

Here I am only worried about passive attacks, like someone who finds the drive in a dumpster or attached to a cloud VM, not about attackers with the ability to replace the file while I’m working on it.

What does need authentication

Does this mean that no one could benefit from authentication? Nope, that’s why we’re here. I can think of at least one broad use case where lack of authentication is uncomfortable.

Cloud backups

If you make a backup with age, and then store it in the cloud, age will prevent the cloud provider from inspecting the backups. However, the provider can replace the whole backup with something else. Maybe you’ll notice while recovering it because your files are not in there, maybe you’ll not and run some code from it that gives the cloud provider access that shouldn’t have been available. Not great.

This generally extends to storing anything in the cloud encrypted with age (or with GnuPG without signing it).

Borderline misuses

Aside from untrusted remote storage, I could only come up with some contrived scenarios. If you were to use age as a way to protect a public API, like “POST an age-encrypted file to this endpoint with your instructions”, you’d probably want it to provide authentication.

age is for file encryption, one could argue that this is misuse—there are better ways to build something like this than using age—but we’re in the business of making safe tools, not another generation of footguns.

(There’s a good chance I’m overlooking some other use cases! I’d like to know about them as I try to design a solution that works broadly. Hit the reply button, you know the drill.)

Authentication vs AEADs

You might have heard that a selling point of age is that it uses AEADs, which expands to Authenticated Encryption with Additional Data. Doesn’t that mean age is authenticated!?

No, that’s about symmetric encryption. I know it’s confusing. We’re bad at naming things.

What using AEADs means is that if you encrypt this file

The Magic Words are Squeamish Ossifrage.

We attack at dawn.

and the attacker knows the We attack at dawn part but not the first part, they can’t tamper with it, flipping some bits, turning dawn into dusk. AEADs have Message Authentication Codes that ensure that whoever authored the whole message knew the symmetric key that allows decrypting it.

This is table stakes and in 2022 you should not use anything that doesn’t do authenticated symmetric encryption.

What’s tricky about AEADs is that they authenticate the whole message at once at the end. This is fine if your message is small enough. If your message is too big, like a multi-GB file, you will want to stream it rather than hold it all in memory, and might be tempted to output (“release”) plaintext as you decrypt it, before you get to and verify the MAC at the end. That’s how you get attacks like EFail. What age does is split the file into 64KiB chunks, encrypt each of them with an AEAD, and verify each of them before releasing its plaintext. The scheme it uses is called STREAM, but intuitively it’s just about putting a counter in the AEAD nonce.

Anyway, AEADs operate at a lower level than asymmetric encryption. What this article is about is authenticated asymmetric encryption, where you get guarantees about who sent the whole message, not just about the message not being partially tampered with.

Authentication vs signing

Wait, “guarantees about who sent the whole message”… isn’t that just signing? Well, no. Existing tools (cough gpg cough) trained us to consider them one and the same because the only way they provide sender authentication is via signing, but authentication is a more limited and easily achieved property. Signing is publicly verifiable: the recipient can go to a third party and prove that the sender sent the message by showing the signature and the sender’s public key. Sender authentication proves to the recipient that the message was generated by the sender. There isn’t necessarily a sender public key, and the recipient might be able to forge messages looking like they are from the sender (called “key compromise impersonation” if you don’t like it, and “deniability” if you do).

Examples of schemes with sender authentication but without signing are the Noise patterns K and X, Signal’s X3DH, NaCl’s box, or libsodium’s key exchange APIs.

Another difference is that while authentication can happen at the key exchange level, and the derived shared symmetric key can be used with STREAM as age does, signatures need to be necessarily computed over the whole message. This sets us back on making the format seekable and streamable: either we make an expensive asymmetric signature for every chunk, or we get fancy with signed Merkle trees, which anyway get us a streamable format only either in the encryption or in the decryption direction. (Or, like discussed above, we just stick a signature at the end and release unverified plaintext at decryption time, causing countless vulnerabilities.)

On combining signing and encryption

The common advice for people who need encryption and authentication is to combine age with a signing tool like signify or OpenSSH (which can sign files now!). Part of why I’m thinking about age and authentication is that I am not very satisfied with that. Combining encryption and signing is actually kinda tricky and can’t be done perfectly. This is also why we did not document recommended ways to do this.

If you encrypt and then sign, an attacker can strip your signature, replace it with their own, and make it look like they encrypted the file even if they don’t actually know the contents.

If you sign and then encrypt, the recipient can decrypt the file, keep your signature, and encrypt it to a different recipient, making it look like you intended to send the file to them.

What you need to avoid both issues at the same time is a different cryptographic primitive called signcryption. The good (???) news is that no popular tool does signcryption, so you’re probably not worse off by using signify and age.³

age is already authenticated!

Here’s the big reveal: age is already authenticated, sort of. You can’t produce an age file that will decrypt with a given identity if you don’t know its recipient.⁴ (Read on for an important gotcha though, which is why this is not yet advertised.)

This means that if you need to make sure an attacker can’t forge age encrypted files for you, you just need to keep the recipient string (age1...) secret from the attacker. For example, if you upload backups to cloud storage, simply make sure you don’t upload the recipient string along with them.

It will be cryptographically impossible for the attacker to generate a file that age will successfully decrypt with the corresponding identity. This is easily shown with an information-theoretical argument: the age file key is encrypted with an AEAD; the key for that is derived with HKDF from the shared secret, the ephemeral key, and the recipient.⁵ Without knowledge of the recipient, the attacker can’t produce a key that will cause the AEAD decryption to succeed. age is very simple, so it’s also easy to show we don’t have baked-in default private keys or NULL cipher modes, so the only identities that are accepted are the ones you specify on the command line.

It’s not as straightforward to show that the recipient can’t be extracted from an encrypted file or from an online decryption oracle, but there’s literature about this that approaches it from the angle of anonymity: if it were possible to extract the recipient, it would be possible to deanonymize it, and they show that’s not possible.

Whether this all holds depends on the recipient type. I am confident the property holds for the X25519 recipients, and that it would hold for a hypothetical Kyber768+X25519 one, but it’s important not to advertise it as an age-wide property.

Note that unlike Noise/X3DH/NaCl⁶ this doesn’t provide for multiple sender identities: the only authentication performed here is “the sender knew the recipient for this identity”. If you need to tell apart files encrypted by Server A and by Server B, you’ll need to generate two separate keypairs, and give each of them a different one. Thankfully, age keypairs are cheap! This lets us avoid adding an extra knob in the UI for the sender’s public/private key: when you use -r you’re proving you know the recipient, and when you use -i you’re saying you want the file to come from someone who knows the corresponding recipient. This might not be enough to, say, encrypt emails, but it’s enough for all the use cases we discussed above.

The UX issue

The main issue with making this guarantee usable is that the recipient is supposed to be public, and now we’re telling people to keep it secret.

I actually had a very hard time getting a cryptographer⁷ to confirm my understanding of the authentication properties, because they kept saying “but the public key is public” and I kept saying “ok but imagine we didn’t give it to the attacker” and we kept going in a loop like that until I said “alright there is no public key anywhere and let’s call this curve point the super-secret sender key”. It went ok from there.

The best idea I have so far is making a plugin called authkey for this, and then the recipients will look like age1authkey1... which might be enough? The alternative, which I am not a fan of, is introducing a new recipient format like AGE-AUTH-KEY1.... I’m looking for feedback on this!

The multiple recipients issue

It’s not this simple though. There is also a gotcha that we need to fix with a technical tweak.

If you encrypt a file to Alice’s recipient and to Bob’s recipient (age supports multiple recipients), and send it to both, you just gave a blank check to Alice and Bob to encrypt files to each other even if they don’t know each other’s recipients.

How that works is that Alice can take the file, derive the file key using her identity, and then use the file key to change the contents of the file, keeping the same header. That new file will still decrypt successfully with Bob’s identity, but it was produced by Alice, who doesn’t know Bob’s recipient. Alice can even drop her stanza from the header and recompute the HMAC, and Bob wouldn’t even know this happened.

This is too sharp an edge to leave it laying around, and it’s why age doesn’t say “it’s authenticated, just keep the recipient secret” on the tin.

We encountered this before while designing age: the symmetric passphrase-encryption scrypt recipient type can’t be used with multiple recipients, precisely for this reason. We figured that if you decrypt a file with a passphrase you have an expectation that whoever produced it knew the passphrase. That is, there is a sender authentication expectation embedded in the passphrase encryption UX. To stop an attack like the one above, we special-cased it.

We can do the same thing with our special authenticated recipient type: enforce⁸ that when it’s used, it’s the only recipient of the file. Rather than adding a special case, I am thinking of extending the plugin protocol and [Recipient interface][] to let a recipient specify that it wants to provide authentication. This will also let us remove the scrypt special case and enable symmetric encryption plugins that are as good as the built-in scrypt recipient type 🙌

age-authkey-plugin

Summing up, bringing authentication to age should be as simple as

adding a way for recipient implementations and plugins to signal they should only ever be used as the only recipient for any given file
making a plugin/built-in recipient type that flips that flag and has a clear name but otherwise looks exactly like the X25519 recipient type

The plugin might not even need to implement the identity/decryption side:⁹ it can just produce files that decrypt with regular native age identities, as those already provide authentication as designed. This prevents the receiving side from checking for misuse, but as we said an attacker can just strip the other recipients from the file, and once a file is encrypted to multiple recipients the damage is done.

None of this is implemented yet, so please reach out with feedback on use-cases, UX, and cryptography design, whether positive or negative!

The picture

Took this at 100ft with no filter, no flash, no custom white balancing, and no post-production. It was this eerie spot of bright red at a depth where only blue light can get. Apparently it’s symbiotic bioluminescent bacteria. Anyway, it had no right to be that red that deep under the sea.

A red anemone with two clownfish, surrounded by deep blue reef.

If you want to stay up to date with my work on age, consider following me on Twitter or subscribing to this newsletter if you haven’t already!

Unpopular opinion: passwords.txt is fine. If you can extract files from my laptop you can probably get my cookies (depending on how the OS keychain is used), and certainly a number of authentication tokens, as well as pictures and private documents. Relevant XKCD. ↩
As far as I can tell, MDCs are mandatory now, but were optional for a long time, meaning attackers could just strip them, generating only a warning that every user and application ignored. This is how EFail happened. GnuPG also supports AEADs now, but again when dealing with a fragmented ecosystem full of weaker protocol options, who knows if the other side updated or not, and whether it’s configured correctly or not. See the Authentication vs AEADs section for some more background about why all this matters. ↩
It’s especially vexing that GnuPG doesn’t do signcryption because it totally could! Its UX already involves dealing with a sender key pair and a recipient key pair for each encrypt+sign operation, and its signatures span the whole message so safe streaming is already out the window. However, alas. ↩
In age, an identity is a string commonly containing a private key like AGE-SECRET-KEY-16DDVPME9RAUQYXULTUXG0L0L87W52H2C0M7PTEXSS9JKHSTR525QP7Z3SM and a recipient is a string commonly containing a public key like age1k5flf920mg7lderduqclc74m5aevha47rzk6x4xww48m9xzy93gs37q7c6. ↩
This doesn’t even rely on X25519 itself having contributory behavior, which requires rejecting the all-zeroes output. The age spec and implementation still do that, but even if an implementation missed that check the HKDF derivation would still provide authentication. Other protocols were not as lucky. ↩
X3DH does three DH exchanges, while here we do only the one. The second introduces sender identities, which we intentionally reject for UX/UI reasons. The third is about forward secrecy and it involves rotating pre-keys. It would be interesting to make a forward-secrecy plugin for age that uses some support server to store the pre-keys. We discussed this briefly at HACS and it might be worth writing up, but it’s going to be another issue :) ↩
“Filippo, I thought you were a cryptographer?” When talking with people who care about the difference, I’m more properly a cryptography engineer, and I still occasionally get actual cryptographers to review my stuff if I’m doing something weird. ↩
This doesn’t stop a malicious sender from bypassing the check and encrypting a file to multiple recipients anyway, but a malicious sender can just share the secret recipient string, so we’re not defending against that, just against accidental “misuse”. (Or, more properly, unexpected behavior.) ↩
This is something I’m pretty happy about the plugin protocol: a plugin can implement only the recipient side and produce files that decrypt without the plugin, or just the identity side and decrypt files that were encrypted without the plugin. It enables things like agent forwarding to happen entirely behind a plugin. Or, if you have a hardware token or an API that supports X25519, you can make a plugin for it that doesn’t require any special software on the encryption side. ↩