HTB Writeup – Crypto – Protein Cookies 2

TL;DR

An interesting combination of crypto challenge and web application. This is a classic Hash Length Extension Attack demo for us to recap in the future.

Web App

Overview

Interestingly this chall provides a website for a crypto CTF. The relative files of the whole challenge can be found on my Github. It's a gym program with super satirical style:

In the bottom we can click to become a member. But we cannot register for the feature is shut down temporarily:

Source Code

Given it's a white box challenge, we are provided with its source code. But it is not a Web chall, we can easily find our way out at route.py:

Python

@web.route('/program')
@verify_login
def program():
    return send_file('flag.pdf')

Once we can access the URI /program, the server with response with the flag as our final attack target:

But we cannot now obviously since we are guest in the initial setting with function view_as_guest. There's a middleware @verify_login before we can access the resources, which is defined at util.py:

Python

def verify_login(func):
    @functools.wraps(func)
    def wrapped(*args, **kwargs):
        if not verify_cookie(request.cookies.get('login_info', '')):
            return redirect(url_for('web.login', error='You are not a logged in member'))

        return func(*args, **kwargs)
    return wrapped

The logic is simple. We need to have a valid cookie as a login user, which is defined as function verify_cookie:

Python

def verify_cookie(cookie_data):
    data, signature = cookie_data.split(".")

    if lj12_hash(SECRET + data.encode()) == signature:
        return {
            k: v[-1] for k, v in parse_qs(data).items()
        }.get('isLoggedIn', '') == 'True'

    return False

This function verifies the cookie split by ., where data contains the actual session data and signature is the hash of the data, used to verify its integrity.
The lj12_hash function is a custom hash function, which we will look into it and study how it works later. It creates a signature with 2 variables, which one is a constant SECRET and another one data. This hashing process exposes the vulnerability of Hash Length Extension Attack.
The verify_cookie function finally extracts the value of isLoggedIn key—if it's True, then returns True which means verification OK; if it's NOT True, then returns False to forbidden logging in.

Therefore, we will still need to check the SECRET & cookie_data variables which are verified. They are defined as a constant and in the create_cookie functions respectively:

Python

SECRET = get_random_bytes(50)

def create_cookie(username, is_logged_in=False):
    data = f'user_id={username}&isLoggedIn={is_logged_in}'
    signature = lj12_hash(SECRET + data.encode())
    return data + '.' + signature

The code initially generates a random SECRET of 50 bytes. It's purpose is to ensure the hash is unique and secure against brute force or collision attacks.
The create_cookie function constructs a string from the user's username and their login status (is_logged_in), which is False by default.
The data is formatted like a query string for HTTP cookies (e.g. username=guest&isLoggedIn=False).
Then it creates the signature with the hash function lj12_hash(SECRET + data.encode()), generating a hash of the concatenation of the SECRET key and the encoded data string as the verification process does.
Finally it forms the cookie with a . to join the data and signature (e.g. <data>.<signature>).

Crypto

So, our target is straightforward—We need to access the /program path with a valid cookie containing user information and the value of isLoggedIn key set to be True (which is False by default).

To step forward, we still need to figure out the Crypto part—How the lj12_hash function works to hash the SECRET & cookie_data to generate a signature? And as I mentioned earlier, since the SECRET is a constant as a beginning part of the hashed object, there's a defect in this design to suffer the Hash Length Extension Attack.

Let's look into the cryptoutil.py to study the crypto module:

Python

from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes

BLOCK_LEN = 32
SECRET = get_random_bytes(50)

iv = b"@\xab\x97\xca\x18\x1d\xac<\x1e\xc3xC\x9b\x1c\xc5\x1f\x8aD=\xec*\x16G\xe7\x89'\x80\xe4\xe6\xfc5l"

The script imports AES for encryption and get_random_bytes for generating secure random bytes, both from the Crypto package (it's depreciated and now we should install pycryptodome module if needed). It defines 3 constants at first:

BLOCK_LEN: Set to 32 bytes, defining the block size for the AES encryption and other operations.
SECRET: A 50-byte random value, same as above.
iv: A predefined initialization vector (IV) for AES encryption, used as a starting block in the hash function. Set to 32 bytes.

Python

def pad(data):
    if len(data) % BLOCK_LEN == 0:
        return data

    pad_byte = bytes([len(data) % 256])
    pad_len = BLOCK_LEN - (len(data) % BLOCK_LEN)
    data += pad_byte * pad_len

    return data

This pad function pads the input data to ensure it is a multiple of BLOCK_LEN (32 bytes). Padding is necessary for block ciphers like AES which require fixed block sizes:

len(data): If the length of data is a multiple of 32 (BLOCK_LEN), then it returns data directly—When len(data) is exactly 256 or a multiple thereof, as the padding byte would incorrectly be 0x00, which could lead to ambiguity in padding removal. So we need the IF condition here.
pad_byte: If not, the padding byte used is the remainder of the data length modulo 256, which will be a value between 1 and 255, inclusive.
pad_len: Calculate the number of padding bytes needed. For example, if len(data) % BLOCK_LEN equals 20, then pad_len will be 32-20=12—We need to add 12 bytes to complete the block which should be 32.
data: Append the padding bytes to the data. With that value of pad_byte repeats pad_len times to extend the data.

Python

def compression_function(data, key):
    if len(data) != BLOCK_LEN or len(key) != BLOCK_LEN:
        raise ValueError(f"Input for compression function is not {BLOCK_LEN} bytes long!")

    # AES is a safe compression function, right? Why not just use that?
    cipher = AES.new(key, AES.MODE_ECB)
    enc = cipher.encrypt(data)

    # let's confuse it up a bit more, don't want to make it too easy!
    enc = enc[::-1]
    enc = enc[::2] + enc[1::2]
    enc = enc[::3] + enc[2::3] + enc[1::3]

    return enc

This function is somewhat misnamed as it actually performs encryption rather than compression. It takes two inputs, data & key, both of which must be exactly BLOCK_LEN bytes long (32 bytes):

AES.MODE_ECB: It uses AES in ECB mode for encryption, which is generally not recommended due to security weaknesses (like patterns preservation).
enc: Encrypt the data using AES in ECB mode.
Data Scrambling: After encryption, the data undergoes several transformations:
- enc[::-1]: First reversal of the entire ciphertext.
- enc[::2] + enc[1::2]: Split and recombine the ciphertext by taking every second byte from the start and the rest from the mid.
- enc[::3] + enc[2::3] + enc[1::3]: Further scramble the bytes based on thirds.

AES.MODE_ECB refers to the Electronic Codebook (ECB) mode of operation for the AES (Advanced Encryption Standard) cipher. ECB is one of several modes in which block ciphers can operate; each mode provides different properties and is suited to different types of encryption tasks.

Suppose we have a plaintext consisting of several blocks, and two of those blocks are identical. When encrypted in ECB mode, those identical plaintext blocks will yield identical ciphertext blocks. This can reveal patterns in the plaintext, which is a major security concern.

Pattern Leakage: The main disadvantage of ECB mode is that it doesn't hide data patterns well. For example, if an image is encrypted block-by-block using ECB, the resulting ciphertext might still reveal visual patterns from the original image, which could provide clues about the original data.

Lack of Diffusion: Because each block is encrypted independently, ECB does not provide serious data diffusion across blocks. If an attacker can manipulate the plaintext or guess the content of specific blocks, they can potentially control or predict the ciphertext of these blocks.

Python

def lj12_hash(data):
    data = pad(data)

    blocks = [data[x:x + BLOCK_LEN] for x in range(0, len(data), BLOCK_LEN)]
    enc_block = iv

    for i in range(len(blocks)):
        enc_block = compression_function(blocks[i], enc_block)

    return enc_block.hex()

The lj12_hash function is a custom cryptographic hash function that uses padding, block-wise processing, and an iterative encryption process with a custom compression function:

Padding: First, it pads the data. Make it a multiple of BLOCK_LEN (32 bytes).
Block Processing:
- It splits the padded data. The list comprehension blocks iterates over the data in steps of BLOCK_LEN, creating a list where each element is a block of BLOCK_LEN bytes.
- Then it initializes the encryption block (enc_block) with a predefined initialization vector (iv). This vector adds an initial randomness or state to the hash function, helping to ensure that similar data inputs result in different hashes.
Iterative Processing:
- Each block is then processed using the compression_function, with the output of the last block used as the key for the next block. It encrypts the data block (blocks[i]) using the current enc_block as the key and then applies a series of byte manipulations to scramble the encrypted data.
- The output of the compression function for each block updates enc_block, ensuring that each block's processing is dependent on the result of the previous block, which chains the encryption process and adds complexity to the hash function.
Return: The final block's hexadecimal representation is returned as the hash.

Overall, lj12_hash is a sophisticated attempt at creating a secure hash function using encryption primitives and additional data scrambling, but its actual security would hinge on the effectiveness and unpredictability of the compression function's scrambling methods and the security of the underlying ECB mode encryption.

The vulnerability lies in the iteration using AES in ECB mode. The iterative processing is using iv as the initial key to encrypt the first split block, then use the newly encrypted block (enc_block) as the new key to encrypt the next block, and so on and so forth. We can use the following picture to depict the process:

This meets the prerequisites of Hash Length Extension Attack, which we will discuss in next chapter. Our goal is to use to ultimate signature (let's say it's Sn) generated by hashing the final block, add up a new block (then it's block(n)) containing our malicious data, and we can use the hash method (compression_function) to create a new, valid signature:

Hash Length Extension Attack

Hash Length Extension Attack is a well-known vulnerability in cryptographic systems that use certain types of hash functions, specifically those based on the Merkle-Damgård construction like MD5, SHA-1, and SHA-256. This type of attack can also be relevant in custom implementations if they inadvertently follow a similar structure or fail to properly handle input data.

Concept: The attack exploits the way certain hashing algorithms process data. In these hash functions, the final state of the hash after processing the initial input can be used as a starting point to process additional data. This is possible because these functions process input data in fixed-size blocks and retain their internal state between blocks.
Execution: The attacker, knowing a hash and the length of the original message but not the original message itself, can append data to the original message such that the hash function, starting from the intermediate hash state, will compute a valid hash for this new, longer message. This can be used to forge messages in systems where hash values are used to verify data integrity and authenticity.

There's a tool on Github called hash_extender and it has detailed introduce this attack method as well. But we cannot apply this tool to solve the challenge today. Because we have a custom hash method in our case.

Therefore, when function verify_cookie verifies a cookie, it uses the lj12_hash function with a variable and a constant (SECRET) to generate a signature which can be manipulated by the Hash Length Extension Attack. And we know its algorithm (compression_function within lj12_hash), so we are free to append data to forge a valid signature:

Python

from Crypto.Cipher import AES


BLOCK_LEN=32

# Paste the signature part from original cookie
sn = "60bff2a97edecdde6aa9▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒"	# change this


# The hash method from the source code
def compression_function(data, key):
    if len(data) != BLOCK_LEN or len(key) != BLOCK_LEN:
        raise ValueError(f"Input for compression function is not {BLOCK_LEN} bytes long!")
    
    cipher = AES.new(key, AES.MODE_ECB)
    enc = cipher.encrypt(data)
    
    enc = enc[::-1]
    enc = enc[::2] + enc[1::2]
    enc = enc[::3] + enc[2::3] + enc[1::3]
    
    return enc


# The pad function to fill blocks
def pad(data):
    if len(data) % BLOCK_LEN == 0:
        return data

    pad_byte = bytes([len(data) % 256])
    pad_len = BLOCK_LEN - (len(data) % BLOCK_LEN)
    data += pad_byte * pad_len

    return data


# Fill the old blocks and keep the signature unchanged
SEC_LEN  = 50
DATA_LEN = len("user_id=guest&isLoggedIn=False")    # 30
PAD_LEN  = BLOCK_LEN - ((SEC_LEN + DATA_LEN) % BLOCK_LEN)

pad_byte = chr((SEC_LEN + DATA_LEN) % 256).encode('utf-8')  # b"P"
padding  = pad_byte * PAD_LEN

filledData = b"user_id=guest&isLoggedIn=False" + padding # length:96


# Modify the hash method to work with an existent signature
def lj12_hash(oldData, newData, sig):
    # calculate new paddings after new data added
    data = pad((b"A"*SEC_LEN+oldData+newData))

    # remove temp SECRET and old data
    newData   = data[(SEC_LEN+len(oldData)):] # b'&isLoggedIn=Truepppppppppppppppp'
    
    # use old signature to continue encrypt newly added data blocks
    blocks    = [newData[x:x + BLOCK_LEN] for x in range(0, len(newData), BLOCK_LEN)]
    enc_block = bytes.fromhex(sig)

    for i in range(len(blocks)):
        enc_block = compression_function(blocks[i], enc_block)

    return enc_block.hex()
    
    
# Forge a valid signature
fakeData = b"&isLoggedIn=True"
fakeSig  = lj12_hash(filledData, fakeData, sn)


# Edit cookie and grab the flag at /program
padding = padding.decode('utf-8')
cookie  = f"user_id=guest&isLoggedIn=False{padding}&isLoggedIn=True.{fakeSig}"
print(cookie)

SECRET: We don't need the value which is randomized, because we can just extend the hash without knowing it. But I constructed a temporary SECRET (make it length 50) to calculate the legitimate padding byte to forge valid signature for the newly added data.
compression_function: Keep it the same as the original one to remain integrity.
pad: What we need to care is the pad_byte value in this function.
- For the original data (SECRET + login_info), we need to pad (50+30)%256=80 (which is P in ASCII) as the pad_byte, and 32-(80%32)=16 as the pad_len to keep the old signature unchanged (finally we get filledData).
- For the newly added data, aka newData, we need to pad (50+30+16+16)=112 (which is p in ASCII) as the pad_byte. We don't actually need to care the pad_len here because the signature remains the same for cookie ...&isLoggedIn=True and ...&isLoggedIn=Trueppppp.... But when we use the old signature as the starting iv to calculate the new signature, we need to extract only the newData part to continue the encryption.
lj12_hash: Rewrite the function by providing new parameters—the padded original cookie oldData, the malicious data newData we want to add, and the signature from the original cookie. With that valid signature we can use it as the ultimate enc_block to encrypt our newly added data blocks, as I introduced in the previous chapter.
newData: The newly added data should be &isLoggedIn=True appended to the end of the cookie, which is able to bypass the check from verify_cookie function.
cookie: Forge the cookie with the controlled paddings, the newly added data, and the faked signature.

With the new cookie generated, edit it in the browser, we can then access the /program page and take the flag:

HTB Writeup – Crypto – Protein Cookies 2

HTB Writeup – Greenhorn

HTB Writeup – Compiled

Axura

Comments | NOTHING

Cancel Reply