You Should Run a Certificate Transparency Log
Hear me out. If you are an organization with some spare storage and bandwidth, or an engineer looking to justify an overprovisioned homelab, you should consider running a Certificate Transparency log. It’s cheaper, easier, and more important than you might think.
Certificate Transparency (CT) is one of the technologies that underpin the security of the whole web. It keeps Certificate Authorities honest, and allows website owners to be notified of unauthorized certificate issuance. It’s a big part of how the WebPKI went from the punchline of “weakest link” jokes to the robust foundation of the security of most of digital life… in less than fifteen years!
CT is an intrinsically distributed system: CAs must submit each certificate to two CT logs operated by third parties and trusted by the browsers. This list is, and has been for a couple years, uncomfortably short. There just aren’t as many independent log operators as we’d like. Operating a log right now would be an immense contribution to the security of virtually every Internet user.
It also comes with the bragging rights to claim that your public key is on billions of devices.
Where’s the catch? Well, until recently running a log was a pain, and expensive. I am writing this because as of a few months ago, this has changed!
Browsers now accept CT logs that implement the new Static CT API, which I designed and productionized in collaboration with Let’s Encrypt and the rest of the WebPKI community over the past year and a half. The key difference is that it makes it possible to serve the read path of a CT log exclusively through static, S3 and CDN friendly files.
Moreover, the new Sunlight implementation, sponsored by Let’s Encrypt, implements the write path with minimal dependencies and requirements. It can upload the Static CT assets directly to object storage, or store them on any POSIX filesystem.
You can learn more if you are curious in Let’s Encrypt’s retrospective, in the original Sunlight design document, or in the summarized public announcement.
Geomys, my open source maintenance firm, operates a pro-bono Sunlight-backed trusted Static CT log for $10k/year, including hardware amortization, colocation, and bandwidth. I’m sure it can be done for cheaper.
The shopping list
Ok, so what does it take to run a CT log in 20256?
- Servers: one. No need to make the log a distributed system, CT itself is a distributed system.
- If you want to offer redundancy you can run multiple logs.
- The uptime target is 99%5 over three months, which allows for nearly 22h of downtime. That’s more than three motherboard failures per month.
- CPU and memory: whatever, as long as it’s ECC memory. Four cores and 2 GB will do.
- Bandwidth: 2 Gbps outbound peak capacity2 (which you can offload to a CDN).
-
Storage: you have two options.
- 3 – 5 TB1 of usable redundant filesystem space on SSDs3.
- 3 – 5 TB1 of S3-compatible object storage, and 200 GB of cache on SSD.
Static CT logs are just flat static files, which you can serve with any HTTP server4 from disk, or expose as a public object storage bucket.
- People: Google policy requires the email addresses of two representatives. The uptime target is forgiving enough that it can probably be met by a single person working during business hours.
That’s pretty much it!
Durability is the first priority: it’s really important that you never lose data once it’s fsync’ed to disk or PUT to object storage, since your log will have signed and returned SCTs, which are promises to serve the certificates it received. This means for example that backups are useless: they would rollback the log’s state.
In terms of ongoing effort, a log operator is expected to read the Google and Apple CT Log policies, monitor the ct-policy@chromium.org mailing list, update the log implementation from time to time, and rotate log temporal shards every year. (For example, we just stood up 2027 shards of our log.)
Given the logs lifecycle, you should plan to stick around for at least three years.
Sign me up!
If you want to become a CT log operator, first of all… thank you!
The Sunlight README was rewritten recently to get you up and running easily. Sunlight is highly specialized for Certificate Transparency and the WebPKI, and it’s designed to help you operate a healthy, useful CT log with minimal configuration.
The community is eager to welcome new log operators. You can post questions, reports, and updates on the transparency.dev Slack, ct-policy mailing list, or Sunlight issue tracker. I encourage you to reach out even just to share your plans, or to ask any questions you might have before committing to running a log.
You might also want to follow me on Bluesky at @filippo.abyssdomain.expert or on Mastodon at @filippo@abyssdomain.expert.
The picture
I systematically make the mistake of reaching a beautiful spot with my motorcycle, watching the sunset, and then realizing “oh, shoot, now it’s dark!” This time, the motorcycle didn’t start, too, and it was the first ride of the season in January. Got to read A Tour of WebAuthn by Adam Langley, though, so who can say if it was good or bad.
Geomys, my Go open source maintenance organization, is funded by Smallstep, Ava Labs, Teleport, Tailscale, and Sentry. Through our retainer contracts they ensure the sustainability and reliability of our open source maintenance work and get a direct line to my expertise and that of the other Geomys maintainers. (Learn more in the Geomys announcement.)
Here are a few words from some of them!
Teleport — For the past five years, attacks and compromises have been shifting from traditional malware and security breaches to identifying and compromising valid user accounts and credentials with social engineering, credential theft, or phishing. Teleport Identity is designed to eliminate weak access patterns through access monitoring, minimize attack surface with access requests, and purge unused permissions via mandatory access reviews.
Ava Labs — We at Ava Labs, maintainer of AvalancheGo (the most widely used client for interacting with the Avalanche Network), believe the sustainable maintenance and development of open source cryptographic protocols is critical to the broad adoption of blockchain technology. We are proud to support this necessary and impactful work through our ongoing sponsorship of Filippo and his team.
-
If a six months shard is assumed to grow up to 2B entries (the biggest so far has been 1.93B), and old shards are deleted one month after they expire, Sunlight on ZFS configured like Tuscolo will need at most 2.75 TB. However, the WebPKI is always growing, and shorter-lived certificates will increase issuance rate, but will also make rotation more efficient. Provisioning 3 TB and having a plan to get to 5 TB if necessary over the next couple years would be prudent. ↩↩
-
This is a conservative estimate of potentially necessary peak capacity. Right now the Tuscolo log produces ~50Mbps average / ~250Mbps peak, but there are relatively few monitors. RFC 6962 logs reported numbers around 1 – 2 Gbps. Static CT reduces bandwidth by almost 80%, but also makes it easier to monitor a log, which might increase demand. YMMV. Verifiable Indexes will hopefully reduce full monitor count in the future. ↩
-
It might be possible to run the object storage part on HDD. The write path would probably be fine, but the read path serves a lot of files with random accesses. Maybe with a large SSD cache layer. ↩
-
Or with Sunlight’s specialized HTTP read path, called Skylight, which has a bunch of nice metrics and health checks. ↩
-
Yep, two nines. Availability of the write path in particular is not a big deal at all: CAs will just fallback to other logs. Availability of the read path is important to ensure timely monitoring of new entries, but it’s just a simple static HTTP server. Note that Google is planning to split the requirements between read and write endpoints, and to require higher availability on the read path. ↩
-
It’s possible the requirements will grow in the future because of short-lived certificates and/or post-quantum signatures, but the ecosystem is very aware of the potential burden on CT log operators, and there are a number of proposals to mitigate it, such as Merkle Tree Certificates and Verifiable Indexes. I am optimistic this will be solved, but even if it won’t you can always turn your log read-only without disrupting the ecosystem, should it get too large. ↩