23 Mar 2023

Planning Go 1.21 Cryptography Work

As most of you are tired to hear by now, I am a professional, full-time open-source maintainer, and a lot of my time is spent maintaining the Go cryptography standard libraries.

Go’s development follows a fixed calendar with two development windows and two releases every year. I try to write about what I plan to do / be involved in for the next release at the beginning of the cycle, and about what happened at the end of the cycle.

This is the planning overview for the Go 1.21 release. There is some exciting API work going on, as well as some satisfying follow-ups on stuff that landed in Go 1.20.

Now is a very good time to provide feedback (and you can do that by just replying to this if you’re reading it in your inbox)! You can also take a look at my public GitHub Projects planning board.

Follow-ups

Go 1.20 was a big release for cryptography, so there are a few items to follow-up on, to tie up loose ends while the iron is hot. (That is, while I still have any context paged into my brain.)

crypto/ecdh follow-ups

The new crypto/ecdh package made its debut in Go 1.20.

This means that we can deprecate almost all of crypto/elliptic in Go 1.21.¹ We leave undeprecated only the Curve singletons returned by [P256()][] and the like, needed to use the crypto/ecdsa APIs. crypto/ecdsa doesn’t actually use crypto/elliptic code besides switching on these values, so crypto/elliptic is now a glorified enum and a legacy compatibility layer. As always, deprecated doesn’t mean removed: per the Go 1 Compatibility Promise crypto/elliptic keeps working, but staticcheck will yell at you about using it.

There is one meaningful bit of functionality in crypto/elliptic that is not in crypto/ecdh yet: compressed point encodings for NIST curves. I explored the topic and proposed an API in the original discussion but then decided to wait to add those APIs. “No is temporary, yes is forever.” If anyone uses compressed point encodings, now would be the time to reach out!

Oh, also P-224, but that’s on purpose and I think we got away with it.

Finally, we can now rewrite golang.org/x/crypto/curve25519 as a crypto/ecdh.X25519 wrapper, hopefully bringing down the number of -25519 implementations in the Go project to one: crypto/internal/edwards25519 (available externally as filippo.io/edwards25519), used by crypto/ecdh and crypto/ed25519, and transitively by the x/crypto/ed25519 and x/crypto/curve25519 wrappers. We’ll wait a few releases before dropping the pre-Go 1.20 compatibility layer from x/crypto.

crypto/rsa follow-ups

The other big change of Go 1.20 has been migrating crypto/rsa away from math/big to crypto/internal/bigmod (available externally as filippo.io/bigmod). Dear reader, I’ll be honest, I did not expect this to land, but it did!

The switch came with a significant performance degradation: between approximately 15% (RSA-2048 on amd64) and 45% (RSA-4096 on arm64), despite dropping in some dedicated amd64 Avo-generated assembly. That’s counter-intuitive: I expected the more-specific crypto/internal/bigmod to be faster than math/big, despite having to waste some cycles on keeping computations constant-time. Turns out that an implementation technique used in crypto/internal/bigmod since its initial submission might be based on outdated wisdom that I also assumed was still true. The idea was that splitting numbers into 63-bit “unsaturated limbs” allows for faster Montgomery multiplication, because it allows a result in the hot loop to fit in 128 bits, so two registers. Well, turns out that might be true in portable C, but if you have access to the add-with-carry instructions of most modern processors, using saturated 64-bit limbs and keeping the carries in flags across loop iterations is faster. That’s what the math/big assembly does.

The good news is that we have pure-Go access to these “addition chains” thanks to math/bits.Add. The bad news is that Montgomery multiplication needs to keep track of two carry bits across loop iterations, and while possible (that’s what ADX is for, it’s just ADC that uses a different flag) it’s too much for the compiler to figure out², so we’re stuck with assembly. I have a CL that switches bigmod to using math/big’s assembly core and it brings crypto/rsa within 5% of its Go 1.19 performance without even using all the assembly. It needs cleaning up, but I expect RSA on Go 1.21 to be at least as fast as on Go 1.19, maybe faster.³

Aside from clawing back performance, there’s a laundry list of follow-ups.

[PrivateKey.Validate][] and [PrivateKey.Equal][] still use math/big and need porting to bigmod or other constant-time code.
The race detector makes the new code dramatically slower. If that’s still the case after the changes above, we probably need to drop some go:norace annotations.
The deprecation of the unused GenerateMultiPrimeKey was reverted at the last minute to go through the proposal process. It’s now ready to land.
A couple docs fixups.

New API proposals

There are a few ongoing proposal discussions that thanks to the new release calendar with shorter freezes have a good chance of landing into Go 1.21 changes.

crypto/tls support for QUIC

Damien Neil, Marten Seemann, and I have been discussing the APIs that crypto/tls needs to expose to support QUIC and HTTP/3. These higher level protocols use TLS 1.3 for the handshake, but then derive their own keys and have their own record encryption, since they don’t run over simple TCP streams. Marten has a fork of crypto/tls for his quic-go implementation, which is a maintenance and ecosystem pain point. The new APIs are intended to let quic-go drop the fork and depend on behavior covered by the Go 1 Compatibility Promise, as well as to support an HTTP/3 implementation in the standard library.

It’s a large and important change, because we’ll be maintaining these APIs in perpetuity, and we have to decide even fundamental architecture details like what side “drives” the handshake: is the caller injecting data, or is crypto/tls requesting it with callbacks from a blocking handshake function? The current design is interesting: the API works synchronously, with the QUIC implementation calling a crypto/tls function to provide incoming bytes, and effectively getting back any bytes it needs to send back, if any. How it gets those bytes back is somewhat weird: it provides callbacks that are called synchronously by crypto/tls before returning; semantically the parameters of those callbacks are just fancy return values of the crypto/tls methods. I am going to try proposing switching them to proper return values to make the semantics clearer and the implementation simpler. What makes the design interesting is that this simple and natural API is a poor match for the internals of crypto/tls, which are used to issue blocking reads until they have a full message, and so can’t interrupt the handshake to return a “please give me more data” signal. The idea is using a hidden goroutine in crypto/tls to bridge the gap. I like it because the complexity is an implementation detail, not set in stone at the API level, and so can be removed in the future if necessary.

Marten’s employer, Protocol Labs, is a client. This helped with working on the proposal, because by the time Damien asked for my input I had already talked to Marten about what quic-go needs, so I had the context to get started. Yay for reciprocal access.

TLS session overhaul

Closely related to the QUIC APIs, are the TLS session resumption APIs. QUIC sometimes needs to store extra data in session tickets, for example. That adds to a list of needs that would be addressed by a well-designed PSK API, now that TLS 1.3 redefined session resumption in terms of PSKs.

I’m very happy about how the general API I had in mind seems to also fit the QUIC 0-RTT use-case, without spreading any 0-RTT complexity into the rest of crypto/tls, based on a conversation with Marten.

I listed the TLS session resumption issues in the Go 1.20 planning post as a stretch goal (which I didn’t get to). Go 1.21 is the right time to do this.

A “public” HTTP Server mode

I get asked all the time to update my Exposing Go on the Internet post, and indeed a lot changed in the past five years, including finally a way to set per-request timeouts. Things are better now, but there’s still a few knobs that need turning to safely expose net/http to the Internet, because changing the default would be a significant backwards-compatibility break. I was going to write a 2023 edition to list those knobs, but then Russ Cox suggested that we should have a better answer than “here’s what you need to know”.

That might mean a dedicated “Public” mode, which tunes all defaults to the safest settings, maybe all the way to over-conservative configurations, with the right knobs to explicitly relax them, for example after authenticating a request. I like this in particular because sometimes there isn’t a “right” default, and the same cap can be both unusable for some applications and unsafe for some others. Public mode would set those to zero (or whatever the fail-closed option is), and let the application tune it.

An interesting thought is whether that mode should also come with a relaxed backwards-compatibility commitment, saying we reserve the right to make changes if we identify new things that need to be tightened.⁴ The argument in favor is that otherwise this is our one chance to get all things right, and if we miss something we’ll be stuck for the next ten years with a “safe” mode that is safer but not entirely safe by default. The argument against is the same as any relaxing of the compatibility promise: if you upgrade Go and your application stops working, you are sad.

X.509 X.509 X.509

Go 1.20 includes the new SetFallbackRoots API we talked about in the recap. The idea is that it will be used as a hook by the golang.org/x/crypto/x509roots/fallback package, which when imported makes a root CA bundle available in case the system doesn’t have any (like empty containers). The x509roots discussion is wrapping up but just as we finalized the original plan, TrustCor went and got distrusted for new certificates. We could just exclude it from the fallback package, and we plan to start by doing that, but the reality is that trust stores are more than a list of roots: they have arbitrary logic embedded in them (like requiring SCTs, limiting the TLDs a CA can issue for, or restricting the issuance date of certificate like in this case). Roland and I discussed a solution that allows CertPool entries to have custom code attached to them so we can add arbitrary restrictions. I am pretty happy with it, because it’s tightly scoped, only accessible at root addition time, and pretty future-proof. Who knows, maybe one day even Linux will figure out how to have proper root stores, and we’ll be ready.

Speaking of root stores and platforms, a few releases ago we switched to using the platform verifier everywhere it’s available, instead of trying to pry the roots (and additional logic) out of clunky APIs. That’s been a good choice. However, it also means we are exposed to the requirements of the platform verifier. If the verifier checks SCTs, we need to provide SCTs. If the verifier enforces OCSP Must Staple, we need to provide stapled OCSP responses. Those can come via TLS extensions (although Chrome is considering deprecating SCTs via TLS extension, which ~nobody uses and makes inspecting the PKI harder) and we have them wired though crypto/tls to the ConnectionState structure, but there’s no way to pass them to crypto/x509 Certificate.Verify, since we never used them there. Hence, proposal to add SCTs and OCSP staples to VerifyOptions.

Miscellaneous

As always, a long tail, starting from the list of TLS changes that were stretch goals in Go 1.20.

And some fresh ones.

crypto: document that GenerateKey functions are not deterministic
- ecdsa.GenerateKey was the last GenerateKey left that wasn’t using MaybeReadByte to stop callers from relying on it being deterministic. Ironically, it was exactly the one that had to change algorithm to work better with the new internals. (It’s now rejection sampling instead of reduction, which for NIST curves is a little simpler/cleaner.) So it’s now non-deterministic, too. We need to document that.
- Relatedly, I am working on a package for deterministic key generation, and a couple articles on how randomness should be used in cryptographic specifications to enable proper testing. Stay tuned :)
crypto/x509: stop using math/big and crypto/elliptic for parsing
- This is another follow-up on the bigmod work. I noticed we are still using math/big in the key parsing functions in crypto/x509.⁵ We shouldn’t!
- While at it, we should land in some way my fancy static analysis test that makes sure math/big is unreachable from crypto functions, and extend it to also cover crypto/x509.
crypto: use purego tag consistently
- We have a convention to use the purego build tag to disable assembly, but we’ve been applying it only in x/crypto because it’s not like anyone imports the stdlib, right? Wrong! TinyGo doesn’t support a bunch of crypto because it depends on assembly. I noticed this discussed on the Gophers Slack. We can fix it by applying purego everywhere, all assembly has a pure Go fallback, by policy. (Also relevant, our platform-specific code policy.)

Finally, a preview bonus: in Go 1.20 Russ made the global math/rand RNG non-deterministic and deprecated the global Seed function. This is great because it removes the backwards compatibility lock-in to a specific RNG algorithm. It means for example that the global lock can be removed or relaxed or even made per-CPU. Now… wouldn’t it be nice if the default algorithm happened to also be cryptographically safe? Not saying we’d even document that, because you should still use crypto/rand for anything security sensitive, but it’d be nice if we failed safe if anyone does use math/rand, right? Well, the question is how fast I can make ChaCha8. I have a quick prototype that is ~50% slower than the current insecure RNG and uses an order of magnitude less memory. Maybe I can land it in Go 1.21 if I find the time to play with it more.

The picture

First motorcycle trip to Tuscolo of the season. There were some folks flying radiocontrolled glider models in the wind. It looked fun. I don’t need a new hobby. But it did look fun. Hmmmm…

A panorama picture taken from a green hill. In the center the valley below is brown and dotted with houses. The sun is starting to set towards the right of the picture, in a smoky but clear light blue sky.

My awesome clients—Sigsum, Protocol Labs, Latacora, Interchain, Smallstep, and Tailscale—are funding all this work and through our retainer contracts they get face time about its direction, as well as unlimited access to advice.

Here are a few words from some of them!

Latacora — Latacora bootstraps security practices for startups. Instead of wasting your time trying to hire a security person who is good at everything from Android security to AWS IAM strategies to SOC2 and apparently has the time to answer all your security questionnaires plus never gets sick or takes a day off, you hire us. We provide a crack team of professionals prepped with processes and power tools, coupling individual security capabilities with strategic program management and tactical project management.

Protocol Labs — Build Peer to Peer apps easily with libp2p — our cross-platform library that offers useful building blocks such as Kademlia DHT for decentralized state, Gossipsub for resilient pub/sub, and NAT traversal protocols for global connectivity. With libp2p, you can easily connect to anyone, anywhere using multiple network transports like QUIC, WebTransport, WebSockets, WebRTC, or TCP through a single interface.

Smallstep — Open source licensing, sustainable open source, and monetizing open source with open core technology — this all is a lot of work and it’s hard within our economic systems. Join Smallstep CEO Mike Malone and Offroad Engineer Carl Tashian as they talk through all of the compromises, triumphs, and love of open source.

The rule used to be that you can deprecate things replaced by a Go 1.N feature in Go 1.N+1, but we concluded that was an off-by-one error. The idea is that once Go 1.21 comes out, every supported Go version (Go 1.20 and Go 1.21) provides the new way of doing things, so projects can migrate to the new stuff without having to support both at the same time. ↩
Hmm, basically thinking out loud, but while writing this I was thinking that maybe we could have dedicated 2048-bit code paths that don’t use a loop at all, and maybe the compiler could figure that out. It might even be faster than the loop. Not perfect, but better than assembly. ↩
At least on amd64 and arm64, which are the only performance targets I optimize for. We design the assembly interfaces so that anyone interested in other architectures can slot in some code if they wish, but we don’t go out of our way to make them faster ourselves. Our resources are finite. The only other first-class port is 32-bit arm, regrettably still used on a lot of arm64-capable systems, but my personal policy is to keep that working and safe, not fast. If you want fast, install a 64-bit kernel. ↩
Technically, we have a blanket exception to the Go 1 Compatibility Promise for security fixes, but we have always been very conservative in invoking it, balancing breakage against security benefit and looking for ways to let applications opt-in to safer behaviors. Maybe with the new GODEBUG framework we can be a bit more aggressive. ↩
No, I don’t want to talk about why the parsing functions are in crypto/x509. ↩