How we avoided side-channels in our new post-quantum Go cryptography libraries
The Trail of Bits cryptography team is releasing our open-source pure Go implementations of ML-DSA (FIPS-204) and SLH-DSA (FIPS-205), two NIST-standardized post-quantum signature algorithms. These implementations have been engineered and reviewed by several of our cryptographers, so if you or your organization is looking to transition to post-quantum support for digital signatures, try them out!
This post will detail some of the work we did to ensure the implementations are constant time. These tricks specifically apply to the ML-DSA (FIPS-204) algorithm, protecting from attacks like KyberSlash, but they also apply to any cryptographic algorithm that requires branching or division.
The road to constant-time FIPS-204
SLH-DSA (FIPS-205) is relatively easy to implement without introducing side channels, as it’s based on pseudorandom functions built from hash functions, but the ML-DSA (FIPS-204) specification includes several integer divisions, which require more careful consideration.
Division was the root cause of a timing attack called KyberSlash that impacted early implementations of Kyber, which later became ML-KEM (FIPS-203). We wanted to avoid this risk entirely in our implementation.
Each of the ML-DSA parameter sets (ML-DSA-44, ML-DSA-65, and ML-DSA-87) include several other parameters that affect the behavior of the algorithm. One of those is called $γ_2$, the low-order rounding range.
$γ_2$ is always an integer, but its value depends on the parameter set. For ML-DSA-44, $γ_2$ is equal to 95232. For ML-DSA-65 and ML-DSA-87, $γ_2$ is equal to 261888.
ML-DSA specifies an algorithm called Decompose, which converts a field element into two components ($r_1$, $r_0$) such that $(r_1 \cdot 2γ_2) + r_0$ equals the original field element. This requires dividing by $2γ_2$ in one step and calculating the remainder of $2γ_2$ in another.
If you ask an AI to implement the Decompose algorithm for you, you will get something like this:
// This code sample was generated by Claude AI.
// Not secure -- DO NOT USE.
//
// Here, `alpha` is equal to `2 * γ2`, and `r` is the field element:
func DecomposeUnsafe(r, alpha int32) (r1, r0 int32) {
// Ensure r is in range [0, q-1]
r = r % q
if r < 0 {
r += q
}
// Center r around 0 (map to range [-(q-1)/2, (q-1)/2])
if r > (q-1)/2 {
r = r - q
}
// Compute r1 = round(r/alpha) where round is rounding to nearest
// with ties broken towards zero
if r >= 0 {
r1 = (r + alpha/2) / alpha
} else {
r1 = (r - alpha/2 + 1) / alpha
}
// Compute r0 = r - r1*alpha
r0 = r - r1*alpha
// Adjust r1 if r0 is too large
if r0 > alpha/2 {
r1++
r0 -= alpha
} else if r0 < -alpha/2 {
r1--
r0 += alpha
}
return r1, r0
}However, this violates cryptography engineering best practices:
- This code flagrantly uses division and modulo operators.
- It contains several branches based on values derived from the field element.
Zen and the art of branchless cryptography
The straightforward approach to preventing branches in any cryptography algorithm is to always perform both sides of the condition (true and false) and then use a constant-time conditional swap based on the condition to obtain the correct result. This involves bit masking, two’s complement, and exclusive OR (XOR).
Removing the branches from this function looks something like this:
// This is another AI-generated code sample.
// Not secure -- DO NOT USE.
func DecomposeUnsafeBranchless(r, alpha int32) (r1, r0 int32) {
// Ensure r is in range [0, q-1]
r = r % q
r += q & (r >> 31) // Add q if r < 0 (using arithmetic right shift)
// Center r around 0 (map to range [-(q-1)/2, (q-1)/2])
mask := -((r - (q-1)/2 - 1) >> 31) // mask = -1 if r > (q-1)/2, else 0
r -= q & mask
// Compute r1 = round(r/alpha) with ties broken towards zero
// For r >= 0: r1 = (r + alpha/2) / alpha
// For r < 0: r1 = (r - alpha/2 + 1) / alpha
signMask := r >> 31 // signMask = -1 if r < 0, else 0
offset := (alpha/2) + (signMask & (-alpha/2 + 1)) // alpha/2 if r >= 0, else -alpha/2 + 1
r1 = (r + offset) / alpha
// Compute r0 = r - r1*alpha
r0 = r - r1*alpha
// Adjust r1 if r0 is too large (branch-free)
// If r0 > alpha/2: r1++, r0 -= alpha
// If r0 < -alpha/2: r1--, r0 += alpha
// Check if r0 > alpha/2
adjustUp := -((r0 - alpha/2 - 1) >> 31) // -1 if r0 > alpha/2, else 0
r1 += adjustUp & 1
r0 -= adjustUp & alpha
// Check if r0 < -alpha/2
adjustDown := -((-r0 - alpha/2 - 1) >> 31) // -1 if r0 < -alpha/2, else 0
r1 -= adjustDown & 1
r0 += adjustDown & alpha
return r1, r0
}That solves our conditional branching problem; however, we aren’t done yet. There are still the troublesome division operators.
Undivided by time: Division-free algorithms
The previous trick of constant-time conditional swaps can be leveraged to implement integer division in constant time as well.
func DivConstTime32(n uint32, d uint32) (uint32, uint32) {
quotient := uint32(0)
R := uint32(0)
// We are dealing with 32-bit integers, so we iterate 32 times
b := uint32(32)
i := b
for range b {
i--
R <<= 1
// R(0) := N(i)
R |= ((n >> i) & 1)
// swap from Sub32() will look like this:
// if remainder > d, swap == 0
// if remainder == d, swap == 0
// if remainder < d, swap == 1
Rprime, swap := bits.Sub32(R, d, 0)
// invert logic of sub32 for conditional swap
swap ^= 1
/*
Desired:
if R > D then swap = 1
if R == D then swap = 1
if R < D then swap = 0
*/
// Qprime := Q
// Qprime(i) := 1
Qprime := quotient
Qprime |= (1 << i)
// Conditional swap:
mask := uint32(-swap)
R ^= ((Rprime ^ R) & mask)
quotient ^= ((Qprime ^ quotient) & mask)
}
return quotient, R
}This works as expected, but it’s slow, since it requires a full loop iteration to calculate each bit of the quotient and remainder. We can do better.
One neat optimization trick: Barrett reduction
Since the value $γ_2$ is fixed for a given parameter set, and the division and modulo operators are performed against $2γ_2$, we can use Barrett reduction with precomputed values instead of division.
Barrett reduction involves multiplying by a reciprocal (in our case, $2^{64}/2γ_2$) and then performing up to two corrective subtractions to obtain a remainder. The quotient is produced as a byproduct of this calculation.
// Calculates (n/d, n%d) given (n, d)
func DivBarrett(numerator, denominator uint32) (uint32, uint32) {
// Since d is always 2 * gamma2, we can precompute (2^64 / d) and use it
var reciprocal uint64
switch denominator {
case 190464: // 2 * 95232
reciprocal = 96851604889688
case 523776: // 2 * 261888
reciprocal = 35184372088832
default:
// Fallback to slow division
return DivConstTime32(numerator, denominator)
}
// Barrett reduction
hi, _ := bits.Mul64(uint64(numerator), reciprocal)
quo := uint32(hi)
r := numerator - quo * denominator
// Two correction steps using bits.Sub32 (constant-time)
for i := 0; i < 2; i++ {
newR, borrow := bits.Sub32(r, denominator, 0)
correction := borrow ^ 1 // 1 if r >= d, 0 if r < d
mask := uint32(-correction)
quo += mask & 1
r ^= mask & (newR ^ r) // Conditional swap using XOR
}
return quo, r
}With this useful function in hand, we can now implement Decompose without branches or divisions.
Toward a post-quantum secure future
The availability of post-quantum signature algorithms in Go is a step toward a future where internet communications remain secure, even if a cryptography-relevant quantum computer is ever developed.
If you’re interested in high-assurance cryptography, even in the face of novel adversaries (including but not limited to future quantum computers), contact our cryptography team today.
