For today’s installment of Dead Bugs Society, I’m going to dig up another one of my favorite exploits. This exploit is actually the second exploit that I wrote for the Apple File Server FPLoginExt stack overflow that DaveG found while we were both working for @stake. I will also take this time to apologize to DaveG for insisting that the bug was a long PathName element (it wasn’t — it was a long UAM string), so that is why the advisory is wrong. Oops. My first exploit did a return into libc to branch into the stack pointer so that I didn’t have to hardcode or brute force stack addresses. But for some odd reason, it worked most of the time, but not every time. It was only after thinking about it a lot and a helpful tip at DEFCON that year that I figured it all out. My second exploit, written post-@stake, for the AFP bug fixed that problem and made exploiting this remote root code execution vulnerability 100% reliable :).
The PowerPC cache design makes exploits very interesting. The PowerPC architecture may have separate instruction and data caches, but not necessarily. PowerPC processors also may have write-back or write-through caches. Understanding why these affect cache coherency, especially for exploits where you are dynamically injecting machine code as data and then executing it as instructions, is very important. Apple’s processors have spanned just about every combination of these. For example, whereas the earlier PowerPC 601 processors had a unified L1 cache, the G3 and G4 had separate 32 KB instruction and data L1 write-back caches. The G5 on the other hand, has separate 64 KB instruction and 32 KB data L1 write-through caches. For a quick comparison between the G4 and G5, see Apple’s TN2087: PowerPC G5 Performance Primer.
The difference between a write-back and write-through cache is when the data from the cache block is written to the next-level cache or main memory. In a write-through cache, changed data is written through immediately. A write-back cache only sends the data back to the next level when a “dirty” cache block is expired from the cache. What does this mean for exploits? On a separate write-back cache processor like the G3 and G4, your exploit payload will be sitting in the L1 data cache and when the CPU branches to your return address, it will fetch the instructions to execute from main memory or the L2 cache. It is highly unlikely that an address on the stack will already be in the L1 instruction cache. Essentially, the CPU will execute stale memory instead of your exploit payload.
In order to get reliable execution, I needed a way to deterministically flush the caches. My first exploit worked most of the time because I would often trigger a page fault by returning into libSystem. The page fault would cause a mode switch into the kernel, flushing all caches to main memory. If I didn’t cause a page fault, however, the exploit would fail. In the end, I wrote a stub that bounced around libSystem five times in order to execute a system call and then branch indirectly through the stack pointer back into my shellcode.