I’ve seen a little gem pass by in a Go cryptography code review and I want to share it because I think it’s a pattern that can be reused.
Let’s start with a problem statement: crypto/x509
Certificate values take a bunch of memory, and for every open TLS connection you end up with a copy of the leaf and intermediate certificate, and sometimes of the root too. That’s kind of a waste of memory, a big one if you open a lot of connections to the same endpoint or to endpoints that use the same roots.
Ideally, if there was already a parsed copy of a certificate in memory we’d just return a pointer to that. An easy way to do that would be to have a
map[string]*x509.Certificate somewhere mapping the certificate bytes to the parsed structure, and reuse an old entry if present. This is the concept of interning, usually used for short strings and other commonly repeated immutable values.
The problem is: how do we evict entries from that cache when they aren’t needed anymore? We can’t have a client that connects to a lot of endpoints one after the other just grow its memory usage endlessly.
The “simplest” solution would be to store a reference counter, remember to decrease it when we don’t need the certificate anymore, and delete the entry from the map when it reaches zero. How do we decrement it though? The
x509.Certificate is needed for as long as the
tls.Conn is live, because you can call
PeerCertificates on the
Conn. Any manual way of doing it is guaranteed to turn out wrong, causing memory leaks.
Some languages have the concept of “weak reference” for this: a pointer that points to the thing but doesn’t keep it live. They are somewhat complicated to implement safely and intuitively, and Go just doesn’t have them. You could try to replicate them by casting
uintptrs but that’s also guaranteed to go wrong eventually, and this time instead of a memory leak you end up with memory corruption. It also relies on undocumented properties of the GC, such as the fact that the GC currently doesn’t move heap-allocated values.
The solution suggested by Russ Cox and implemented by Roland Shoemaker uses the garbage collector itself to track when the map entry should be dropped.
It’s relatively simple: when you request a certificate from the cache, you get back an
x509.Certificate wrapped in an
activeCert struct. You’re expected to hold on to the
activeCert for as long as you need the
x509.Certificate. For example,
crypto/tls sticks the
activeCert in a private field of
tls.Conn. The magic is that
activeCert has a finalizer attached to it, so that when it gets garbage-collected it also decrements the reference counter of the certificate in the map. When the counter hits zero, the map entry is deleted.
It’s using the GC to keep track of when entries in the cache stop being useful. You can’t put the finalizer on the certificate itself because the certificate is the thing kept alive by the map. Instead, you have a distinct
activeCert for every place the certificate is used (specifically, for every
tls.Conn that needs it), and keep track of when all of them have been garbage-collected.
I like it because it’s simple to use—you just store the
activeCert next to the certificate—and because it fails gracefully. If you drop the
activeCert while something is still using the certificate, for example if the certificate outlives the
Conn, nothing bad happens besides potentially making the cache a little less efficient. The certificate itself will still be kept alive by the GC, even if it’s dropped from the cache.
Finalizers are scary and generally speaking if you use them like destructors you’re gonna have a bad time. For example, what if the program runs with
GOGC=off? However, this is a good use for them because what we are doing is exclusively related to memory management and value lifecycle. In David’s words, we’re managing “a resource whose use is tied closely to heap use”. A similar use of finalizers that makes sense to me is autopool which returns things to a
sync.Pool automatically. This is the kind of thing that can happen “whenever the GC happens to collect the value” so is a good fit for finalizers.
As a nice side-effect, we also save the CPU cycles of parsing certificates that are already in use by other connections, although Roland’s cryptobyte rewrite of the parser had made that way faster already.
This change has now landed in
master and will be in Go 1.20 (unless we have to revert it) and it has zero exposed APIs: you just upgrade Go and every application that opens multiple TLS connections to the same endpoint will be automagically a little lighter and faster. ✨
Like for most Go standard library changes, a lot of the work was actually in a less visible part: we spent quite a bit of time discussing whether this change had backwards compatibility issues, because the certificates you get from
PeerCertificates are now shared between connections, and what happens if you modify one? We concluded this is fine because it was already the case that some certificates in the chain could be shared between connections, for example if they referred to the roots or intermediates
CertPool. We added an explicit note about it to the docs, too.
The root on Linux is actually a shared pointer to the Certificate value in the CertPool. There is some cool stuff going on to keep roots compressed until they are needed. Other platforms don’t load the roots but instead use the platform API to do the verification and get back a full chain, so also the root gets duplicated. ↩︎
GOGC=offbecause the compiler is short-lived enough and gets a little faster that way. ↩︎