This spring and summer, as an intern at Trail of Bits, I researched modeling fault attacks on RSA signatures. I looked at an optimization of RSA signing that uses the Chinese Remainder Theorem (CRT) and induced calculation faults that reveal private keys. I analyzed fault attacks at a low level rather than in a mathematical context. After analyzing both a toy program and the mbed TLS implementation of RSA, I identified bits in memory that leak private keys when flipped.
The Signature Process with RSA-CRT
Normally, an RSA signing operation would use this algorithm:
s=md (mod n). Here,
s represents the signature,
m the message,
d the private exponent, and
n the public key. This algorithm is effective, but when the numbers involved increase to the necessary size for security, the computation begins to take an extremely long time. For this reason, many cryptography libraries use the Chinese Remainder Theorem (CRT) to speed up decryption and signing. The CRT splits up the single large calculation into two smaller ones before stitching their results together.
Given the private exponent
d, we calculate two values,
dp=d (mod (p-1)) and
dq=d (mod (q-1)).
We then compute two partial signatures, each using one of these two numbers; the first partial signature,
mdp (mod p), while the second,
mdq (mod q). The inverses of
p (mod q) and
q (mod p) are calculated, and finally, the two partial signatures are combined to form the final signature,
s=(s1*q*qInv)+(s2*p*pInv) (mod n).
The Fault Attack
The problem arises when one of the two partial signatures (let’s assume it’s
s2, calculated using
q) is incorrect. It happens.
Combining the two partial signatures will give us a faulty final signature. If the signature were correct, we would be able to verify it by comparing the original message to
se (mod n), where
e is the public exponent. However, with the faulted signature,
se (mod p) will still equal
se (mod q) will not.
From here, we can say that
p, but not
q, is a factor of
p is also a factor of
n itself, the attacker can take the greatest common denominator of
se-m to extract
n divided by
p is simply
q, and the attacker now has both of the private keys.
Faulting a Toy Program
I began by writing a simple toy program in C to conduct RSA signing using the Chinese Remainder Theorem. This program included no padding and no checks, using textbook RSA to sign fairly small numbers. I used a debugger to modify one of the partial signatures manually and produce a faulted final signature. I wrote a program in Python to use this faulted signature to calculate the private keys and successfully decrypt another encrypted message. I tried altering data at various different stages of the signing process to see whether I could still extract the private keys. When I felt comfortable carrying out these fault attacks by hand, I began to automate the process.
Flipping Bits with Manticore
I used Binary Ninja to view the disassembly of my program and identify the memory locations of the data that I was interested in. When I tried to solve for the private keys, I would know where to look. Then, I installed and learned how to use Manticore, the binary analysis tool developed by Trail of Bits with which I was going to conduct the fault attacks.
I wrote a Manticore script that would iterate through each consecutive byte of memory, alter an instruction by flipping a bit in that byte, and execute the RSA signing program. For each execution that did not result in a crash or a timeout, I used the output to try to extract the private keys. I checked them against the correct keys by attempting to successfully decrypt another message. With all of this data, I generated a CSV file of the intermediate and final results from each bit flip, including the partial signatures, the private keys, and whether the private keys were accurate.
I tested a total of 938 bit flips, and I found that 45 of them, or 4.8%, successfully produced the correct private keys. Nearly 55% resulted in either a crash or a timeout, meaning that the program failed to create a signature. Approximately 31% did not alter the partial signatures.
This kind of automation offers a massive speedup in developing exploits for vulnerabilities like this, as once you simply describe the vulnerability to Manticore, you get back a comprehensive list of ways to exploit it. This is particularly useful if you’re able to introduce some imprecise fault (e.g. using Rowhammer) as you can find clusters of bits which, when flipped, leak a private key.
Faulting mbed TLS
Once I had the file of bit flip results for my toy program, I looked for a real cryptographic library to attack. I settled on mbed TLS, an implementation that is primarily used on embedded systems. Because it was much more complex than the program I had written, I spent some time looking at the mbed TLS source code to try to understand the RSA signing process before compiling it and looking at the disassembled binary using Binary Ninja.
One key difference between mbed TLS and my toy program was that signatures using mbed TLS were padded. The fault attacks I was trying to model are applicable only to deterministic padding, in which a given message will always result in the same padded value, and not to probabilistic schemes. Although mbed TLS can implement a variety of different padding schemes, I looked at RSA signing using PKCS#1 v1.5, a deterministic alternative to the more complex, randomized PSS padding scheme. Again, I used a debugger to locate the target data. When I knew what memory locations I would be reading from, I began to fault one of the partial signatures and produce an incorrect signature.
I soon realized, however, that there were some runtime checks in place to prevent a fault attack of the type I was trying to conduct. In particular, two of the checks, if failed, would stop execution and output an error message without creating the signature. I used the debugger to skip over the checks and produce the faulted signature I was looking for.
With the faulted signature and all of the public key data, I was able to replicate the process I had used on my toy program to extract the private keys successfully.
Automating the Attacks
Just as I had with the toy program, I started to try to automate the fault attacks and identify the bit flips that would leak the private keys. In order to speed up the process, I wrote a GDB script instead of using Manticore. I found bit flips that would allow me to bypass both of the checks that would normally prevent the creation of a faulted signature. I used GDB to alter both of those memory instructions. In a process identical to the toy program, I also flipped one bit in a given memory address. I then used Python to loop through each byte of memory, call this script, and try to extract the private keys, again checking whether they were correct by attempting to decrypt a known message. I collected the solved private keys and wrote the results to a CSV file of all the bit flips.
I tested 566 bit flips, all within the portion of the mbed TLS code that carried out the signing operation. Combined with the two bit flips that ensured that the checks would pass, I found that 28 of them – nearly 5% – leaked the private keys. About 55% failed to produce a signature.
The fact that this kind of analysis works on real programs is exciting, but unfortunately, I ran out of time in the summer before I got a chance to test it in the “real world.” Nonetheless, the ability to input real TLS code and get a comprehensive description of fault attacks against it is exciting, and yields fascinating possibilities for future research.
I loved working at Trail of Bits. I gained a better understanding of cryptography, and became familiar with some of the tools used by security engineers. It was a wonderful experience, and I’m excited to apply everything I learned to my classes and projects at Carnegie Mellon University when I start next year.