27 May 2020

Is X25519 Associative? Sometimes!

X25519 is a simple Elliptic Curve Diffie-Hellman (ECDH) function: it takes a scalar (a fancy name for an integer¹) and an elliptic curve point, and it multiplies the elliptic curve point by the scalar.

Point additions and multiplications work modulo the order of the point, just like hours on a watch work modulo 12. In cryptography we try to work with points of order a large prime number. This is the order of the scalar field, and is commonly referred to as q.

The age design includes a double invocation of X25519: once with a domain separation tweak, once with the actual secret². While implementing this I wondered: can we multiply those two scalars first, instead of doing two point multiplications in a row? It would be much faster!

That is, given two scalars s1 and s2, can we find a third scalar s3 such that the following is true?

X25519(s3, P) == X25519(s2, X25519(s1, P))

If we are working on the pure curve, the answer is simple: yes, just work out the multiplication modulo the order of the scalar field.

s1 * (s2 * P) = (s1 * s2) * P   mod q
s3 = s1 * s2                    mod q

But! X25519 is not pure math. In fact, it takes a 32 bytes array as input for the scalar. That array eventually gets interpreted as a large little-endian number, but first it’s clamped.

Clamping is this extremely unmathematical operation:

s[0] &= 0b11111000
s[31] &= 0b01111111
s[31] |= 0b01000000

It comes straight from the original Curve25519 paper, and most of us just treated it with the deferential skepticism reserved to black magic.³

The first part, s[0] &= 0b11111000, is clearing the 3 least significant bits (remember, little-endian, so the first byte is the least significant). That makes the scalar always a multiple of 8, which clears the cofactor of the point that’s multiplied by it.⁴

The second part is way more obscure: it’s clearing the 256th bit, and setting the 255th. I (and most others I talked to) always thought it was making sure the number is smaller than q, but no: q has 253 bits, this is specifically ensuring the scalar overflows the modulo. I eventually found a mailing list post by Bernstein where he explains he’s trying to protect against a specific implementation bug by making the position of the highest bit constant. Until that email in 2014, the only explanation seems to have been a line on these slides.

Ok, so the question is: is there a way to represent our product of two scalars in such a way that clamping will not change it, so that we can use a stock X25519 implementation?

Or, can we find a value s3 such that the following is true?

clamp(s3) == clamp(s1) * clamp(s2)    mod q

To survive clamping without modification, our value will have to be a multiple of 8, have the 256th bit unset, and the 255th bit set. That is, it needs to be a multiple of 8 between 2^254 and 2^255-1.

We start by reducing clamp(s1) * clamp(s2) by q (at which point it might not be a multiple of 8 anymore, because q and 8 are co-prime, because q is prime). Then we add q to our reduced product until we reach 2^254 (remember, modulo q it’s like adding zero), and then keep adding q until we pass 2^255-1. If any of the numbers we hit in the range is a multiple of 8, that number can be passed to X25519 and it will survive clamping.

x = clamp(s1) * clamp(s2)    mod q
s3 = x + k * q

Since q only fits 3 times in the range of clamping-safe values, we only get 4 tries to hit a multiple of 8. That’s not enough to be certain we will, and depending on the starting point we could get lucky or not.

For example, if we pick these two scalars

s1 = 0x84d5e9b76850c18052ca3b0a9e86ab80c61896ae004689f80c0e08db24dbce16
s2 = 0xf73ad45dd13145011f96d979ea56c0916e2725bbaf1126ea9c68767cdf5f0e38

then there is a clamping-safe product

s3 = (clamp(s1) * clamp(s2)) % q + 5 * q = 0x541805c3f9e04d7a08e548467006698c5f2efebc0963e83c982b31e7a96a2680

but for these two other scalars

s1 = 0x515387f40ace84e4fafe73d86297c42bc1ea3e0cef0b3236a7db082f5e35fb28
s2 = 0x8202824542a0d07193281252debce886faa94811f38763cbd350ca85f60786ce

none of the values equivalent to the product with the right high bits are multiples of 8, so there is no combined X25519 output for them.

The answer to “can you combine two X25519 operations by multiplying the scalars first” is, quite unexpectedly, “sometimes, it depends randomly on the two scalars”. Cryptography is weird.

(If we really wanted to do the performance trick, we could still take a modified implementation of X25519 that doesn’t clamp, and pass clamp(s1) * clamp(s2) mod q into that, but this wasn’t worth forking the standard library for age.)

A few interesting unrelated links

Let’s Encrypt published a nice description of ASN.1, the cursed encoding format of all things legacy crypto, which does a good job of communicating that no, this stuff is not easy, but no, this stuff is not black magic.
A new paper dropped on the ePrint about ECDSA. Soatok has a good commentary. Don’t panic and carry on avoiding ECDSA.
I’m proposing a Go API for age, my file encryption format/library/tool, and would love feedback.
yubikey-agent is a seamless, indestructible, one-click SSH agent for YubiKeys.
draft-irtf-cfrg-ristretto255-00 is available in beautifully typeset HTML. (Not kidding, I was ready to hate the new RFC presentation format and instead I love it. It has paragraph anchors!)
Cthulhu awoke 🦑

Appendix: the Sage code

As an appendix, here’s the Sage code I used to work this out. You can run it multiple times and observe that sometimes it will find a value s3 that matches both conditions, and sometimes it won’t.

ec = EllipticCurve(GF(2**255-19), [0,486662,0,1,0])
base_point = ec.lift_x(9)
reverse_endianness = lambda h: "".join(h[i:i+2] for i in range(len(h), 0, -2))
hex_to_int = lambda h: Integer(reverse_endianness(h), 16)
int_to_hex = lambda s: reverse_endianness(s.hex().zfill(32 * 2))
x_coord = lambda p: int_to_hex(Integer(p.xy()[0]))
high_mask = 1 << 254
clamp = lambda s: ((s % high_mask) + high_mask) // 8 * 8
in_range = lambda x: high_mask <= x and x < 2 * high_mask
s1 = ZZ.random_element(1 << 256)
s2 = ZZ.random_element(1 << 256)
x = (clamp(s1) * clamp(s2)) % base_point.order()
for i in range(3, 8):
    s3 = x + i * base_point.order()
    print(i, in_range(s3), s3 % 8 == 0)

A pretty picture of the Catskills

I drove 500 miles upstate over the weekend and it blew away all my expectations.

A creek running alongside a road with trees on the side and a motorbike parked on the side of the road.

It’s called scalar because point multiplication doesn’t actually exist, it’s just defined in terms of repeated point additions, hence scalar multiplication, and scalar. ↩
This is only true when age is using an Ed25519 key as a X25519 one, in a weak attempt to make cross-protocol oracle attacks a little harder. Turns out it doesn’t actually help, so we’ll remove it in version two of the recipient type. ↩
As you might know, my whole shtick is that cryptography engineering should be simpler and better documented than most software engineering, because it blows most of its complexity budget on the primitives, so I am not a fan of any design decision that feels like black magic, in particular when they are not properly explained. ↩
Curve25519 has cofactor 8, meaning that you can think of points as having two independent values, the first of which has order q, the second of order 8. Multiplying a point by a multiple of 8 will ensure the result has a value of zero in the cofactor component (8k * n = 0 mod 8), avoiding leaking the value of the scalar modulo 8. It’s enough to make a curve with a cofactor tenable for authenticated Diffie-Hellman, but it’s not a safe design and it has caused a bunch of real-world bugs. New cryptosystems should just use a group of pure prime order, like ristretto255. ↩