TL;DR
An interesting combination of crypto challenge and web application. This is a classic Hash Length Extension Attack demo for us to recap in the future.
Web App
Overview
Interestingly this chall provides a website for a crypto CTF. The relative files of the whole challenge can be found on my Github. It's a gym program with super satirical style:
In the bottom we can click to become a member. But we cannot register for the feature is shut down temporarily:
Source Code
Given it's a white box challenge, we are provided with its source code. But it is not a Web chall, we can easily find our way out at route.py
:
@web.route('/program')
@verify_login
def program():
return send_file('flag.pdf')
Once we can access the URI /program
, the server with response with the flag as our final attack target:
But we cannot now obviously since we are guest in the initial setting with function view_as_guest
. There's a middleware @verify_login
before we can access the resources, which is defined at util.py
:
def verify_login(func):
@functools.wraps(func)
def wrapped(*args, **kwargs):
if not verify_cookie(request.cookies.get('login_info', '')):
return redirect(url_for('web.login', error='You are not a logged in member'))
return func(*args, **kwargs)
return wrapped
The logic is simple. We need to have a valid cookie as a login user, which is defined as function verify_cookie
:
def verify_cookie(cookie_data):
data, signature = cookie_data.split(".")
if lj12_hash(SECRET + data.encode()) == signature:
return {
k: v[-1] for k, v in parse_qs(data).items()
}.get('isLoggedIn', '') == 'True'
return False
- This function verifies the cookie split by
.
, wheredata
contains the actual session data andsignature
is the hash of the data, used to verify its integrity. - The
lj12_hash
function is a custom hash function, which we will look into it and study how it works later. It creates a signature with 2 variables, which one is a constantSECRET
and another onedata
. This hashing process exposes the vulnerability of Hash Length Extension Attack. - The
verify_cookie
function finally extracts the value ofisLoggedIn
key—if it'sTrue
, then returns True which means verification OK; if it'sNOT True
, then returns False to forbidden logging in.
Therefore, we will still need to check the SECRET
& cookie_data
variables which are verified. They are defined as a constant and in the create_cookie
functions respectively:
SECRET = get_random_bytes(50)
def create_cookie(username, is_logged_in=False):
data = f'user_id={username}&isLoggedIn={is_logged_in}'
signature = lj12_hash(SECRET + data.encode())
return data + '.' + signature
- The code initially generates a random
SECRET
of 50 bytes. It's purpose is to ensure the hash is unique and secure against brute force or collision attacks. - The
create_cookie
function constructs a string from the user'susername
and their login status (is_logged_in
), which isFalse
by default. - The
data
is formatted like a query string for HTTP cookies (e.g.username=guest&isLoggedIn=False
). - Then it creates the
signature
with the hash functionlj12_hash(SECRET + data.encode())
, generating a hash of the concatenation of theSECRET
key and the encodeddata
string as the verification process does. - Finally it forms the cookie with a
.
to join thedata
andsignature
(e.g.<data>.<signature>
).
Crypto
So, our target is straightforward—We need to access the /program
path with a valid cookie containing user information and the value of isLoggedIn
key set to be True
(which is False
by default).
To step forward, we still need to figure out the Crypto part—How the lj12_hash
function works to hash the SECRET
& cookie_data
to generate a signature? And as I mentioned earlier, since the SECRET
is a constant as a beginning part of the hashed object, there's a defect in this design to suffer the Hash Length Extension Attack.
Let's look into the cryptoutil.py
to study the crypto module:
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
BLOCK_LEN = 32
SECRET = get_random_bytes(50)
iv = b"@\xab\x97\xca\x18\x1d\xac<\x1e\xc3xC\x9b\x1c\xc5\x1f\x8aD=\xec*\x16G\xe7\x89'\x80\xe4\xe6\xfc5l"
The script imports AES for encryption and get_random_bytes
for generating secure random bytes, both from the Crypto
package (it's depreciated and now we should install pycryptodome
module if needed). It defines 3 constants at first:
BLOCK_LEN
: Set to 32 bytes, defining the block size for the AES encryption and other operations.SECRET
: A 50-byte random value, same as above.iv
: A predefined initialization vector (IV) for AES encryption, used as a starting block in the hash function. Set to 32 bytes.
def pad(data):
if len(data) % BLOCK_LEN == 0:
return data
pad_byte = bytes([len(data) % 256])
pad_len = BLOCK_LEN - (len(data) % BLOCK_LEN)
data += pad_byte * pad_len
return data
This pad
function pads the input data to ensure it is a multiple of BLOCK_LEN
(32 bytes). Padding is necessary for block ciphers like AES which require fixed block sizes:
len(data)
: If the length ofdata
is a multiple of 32 (BLOCK_LEN
), then it returnsdata
directly—Whenlen(data)
is exactly 256 or a multiple thereof, as the padding byte would incorrectly be0x00
, which could lead to ambiguity in padding removal. So we need the IF condition here.pad_byte
: If not, the padding byte used is the remainder of the data length modulo 256, which will be a value between 1 and 255, inclusive.pad_len
: Calculate the number of padding bytes needed. For example, iflen(data) % BLOCK_LEN
equals 20, thenpad_len
will be32-20=12
—We need to add 12 bytes to complete the block which should be 32.data
: Append the padding bytes to the data. With that value ofpad_byte
repeatspad_len
times to extend the data.
def compression_function(data, key):
if len(data) != BLOCK_LEN or len(key) != BLOCK_LEN:
raise ValueError(f"Input for compression function is not {BLOCK_LEN} bytes long!")
# AES is a safe compression function, right? Why not just use that?
cipher = AES.new(key, AES.MODE_ECB)
enc = cipher.encrypt(data)
# let's confuse it up a bit more, don't want to make it too easy!
enc = enc[::-1]
enc = enc[::2] + enc[1::2]
enc = enc[::3] + enc[2::3] + enc[1::3]
return enc
This function is somewhat misnamed as it actually performs encryption rather than compression. It takes two inputs, data
& key
, both of which must be exactly BLOCK_LEN
bytes long (32 bytes):
AES.MODE_ECB
: It uses AES in ECB mode for encryption, which is generally not recommended due to security weaknesses (like patterns preservation).enc
: Encrypt thedata
using AES in ECB mode.- Data Scrambling: After encryption, the data undergoes several transformations:
enc[::-1]
: First reversal of the entire ciphertext.enc[::2] + enc[1::2]
: Split and recombine the ciphertext by taking every second byte from the start and the rest from the mid.enc[::3] + enc[2::3] + enc[1::3]
: Further scramble the bytes based on thirds.
AES.MODE_ECB
refers to the Electronic Codebook (ECB) mode of operation for the AES (Advanced Encryption Standard) cipher. ECB is one of several modes in which block ciphers can operate; each mode provides different properties and is suited to different types of encryption tasks.Suppose we have a plaintext consisting of several blocks, and two of those blocks are identical. When encrypted in ECB mode, those identical plaintext blocks will yield identical ciphertext blocks. This can reveal patterns in the plaintext, which is a major security concern.
Pattern Leakage: The main disadvantage of ECB mode is that it doesn't hide data patterns well. For example, if an image is encrypted block-by-block using ECB, the resulting ciphertext might still reveal visual patterns from the original image, which could provide clues about the original data.
Lack of Diffusion: Because each block is encrypted independently, ECB does not provide serious data diffusion across blocks. If an attacker can manipulate the plaintext or guess the content of specific blocks, they can potentially control or predict the ciphertext of these blocks.
def lj12_hash(data):
data = pad(data)
blocks = [data[x:x + BLOCK_LEN] for x in range(0, len(data), BLOCK_LEN)]
enc_block = iv
for i in range(len(blocks)):
enc_block = compression_function(blocks[i], enc_block)
return enc_block.hex()
The lj12_hash
function is a custom cryptographic hash function that uses padding, block-wise processing, and an iterative encryption process with a custom compression function:
- Padding: First, it pads the data. Make it a multiple of
BLOCK_LEN
(32 bytes). - Block Processing:
- It splits the padded data. The list comprehension
blocks
iterates over the data in steps ofBLOCK_LEN
, creating a list where each element is a block ofBLOCK_LEN
bytes. - Then it initializes the encryption block (
enc_block
) with a predefined initialization vector (iv
). This vector adds an initial randomness or state to the hash function, helping to ensure that similar data inputs result in different hashes.
- It splits the padded data. The list comprehension
- Iterative Processing:
- Each block is then processed using the
compression_function
, with the output of the last block used as the key for the next block. It encrypts the data block (blocks[i]
) using the currentenc_block
as the key and then applies a series of byte manipulations to scramble the encrypted data. - The output of the compression function for each block updates
enc_block
, ensuring that each block's processing is dependent on the result of the previous block, which chains the encryption process and adds complexity to the hash function.
- Each block is then processed using the
- Return: The final block's hexadecimal representation is returned as the hash.
Overall, lj12_hash
is a sophisticated attempt at creating a secure hash function using encryption primitives and additional data scrambling, but its actual security would hinge on the effectiveness and unpredictability of the compression function's scrambling methods and the security of the underlying ECB mode encryption.
The vulnerability lies in the iteration using AES in ECB mode. The iterative processing is using iv
as the initial key to encrypt the first split block, then use the newly encrypted block (enc_block
) as the new key to encrypt the next block, and so on and so forth. We can use the following picture to depict the process:
This meets the prerequisites of Hash Length Extension Attack, which we will discuss in next chapter. Our goal is to use to ultimate signature (let's say it's Sn
) generated by hashing the final block, add up a new block (then it's block(n)
) containing our malicious data, and we can use the hash method (compression_function
) to create a new, valid signature:
Hash Length Extension Attack
Hash Length Extension Attack is a well-known vulnerability in cryptographic systems that use certain types of hash functions, specifically those based on the Merkle-Damgård construction like MD5, SHA-1, and SHA-256. This type of attack can also be relevant in custom implementations if they inadvertently follow a similar structure or fail to properly handle input data.
- Concept: The attack exploits the way certain hashing algorithms process data. In these hash functions, the final state of the hash after processing the initial input can be used as a starting point to process additional data. This is possible because these functions process input data in fixed-size blocks and retain their internal state between blocks.
- Execution: The attacker, knowing a hash and the length of the original message but not the original message itself, can append data to the original message such that the hash function, starting from the intermediate hash state, will compute a valid hash for this new, longer message. This can be used to forge messages in systems where hash values are used to verify data integrity and authenticity.
There's a tool on Github called hash_extender and it has detailed introduce this attack method as well. But we cannot apply this tool to solve the challenge today. Because we have a custom hash method in our case.
Therefore, when function verify_cookie
verifies a cookie, it uses the lj12_hash
function with a variable and a constant (SECRET
) to generate a signature which can be manipulated by the Hash Length Extension Attack. And we know its algorithm (compression_function
within lj12_hash
), so we are free to append data to forge a valid signature:
from Crypto.Cipher import AES
BLOCK_LEN=32
# Paste the signature part from original cookie
sn = "60bff2a97edecdde6aa9▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒" # change this
# The hash method from the source code
def compression_function(data, key):
if len(data) != BLOCK_LEN or len(key) != BLOCK_LEN:
raise ValueError(f"Input for compression function is not {BLOCK_LEN} bytes long!")
cipher = AES.new(key, AES.MODE_ECB)
enc = cipher.encrypt(data)
enc = enc[::-1]
enc = enc[::2] + enc[1::2]
enc = enc[::3] + enc[2::3] + enc[1::3]
return enc
# The pad function to fill blocks
def pad(data):
if len(data) % BLOCK_LEN == 0:
return data
pad_byte = bytes([len(data) % 256])
pad_len = BLOCK_LEN - (len(data) % BLOCK_LEN)
data += pad_byte * pad_len
return data
# Fill the old blocks and keep the signature unchanged
SEC_LEN = 50
DATA_LEN = len("user_id=guest&isLoggedIn=False") # 30
PAD_LEN = BLOCK_LEN - ((SEC_LEN + DATA_LEN) % BLOCK_LEN)
pad_byte = chr((SEC_LEN + DATA_LEN) % 256).encode('utf-8') # b"P"
padding = pad_byte * PAD_LEN
filledData = b"user_id=guest&isLoggedIn=False" + padding # length:96
# Modify the hash method to work with an existent signature
def lj12_hash(oldData, newData, sig):
# calculate new paddings after new data added
data = pad((b"A"*SEC_LEN+oldData+newData))
# remove temp SECRET and old data
newData = data[(SEC_LEN+len(oldData)):] # b'&isLoggedIn=Truepppppppppppppppp'
# use old signature to continue encrypt newly added data blocks
blocks = [newData[x:x + BLOCK_LEN] for x in range(0, len(newData), BLOCK_LEN)]
enc_block = bytes.fromhex(sig)
for i in range(len(blocks)):
enc_block = compression_function(blocks[i], enc_block)
return enc_block.hex()
# Forge a valid signature
fakeData = b"&isLoggedIn=True"
fakeSig = lj12_hash(filledData, fakeData, sn)
# Edit cookie and grab the flag at /program
padding = padding.decode('utf-8')
cookie = f"user_id=guest&isLoggedIn=False{padding}&isLoggedIn=True.{fakeSig}"
print(cookie)
SECRET
: We don't need the value which is randomized, because we can just extend the hash without knowing it. But I constructed a temporarySECRET
(make it length 50) to calculate the legitimate padding byte to forge valid signature for the newly added data.compression_function
: Keep it the same as the original one to remain integrity.pad
: What we need to care is thepad_byte
value in this function.- For the original data (SECRET + login_info), we need to pad
(50+30)%256=80
(which isP
in ASCII) as thepad_byte
, and32-(80%32)=16
as thepad_len
to keep the old signature unchanged (finally we getfilledData)
. - For the newly added data, aka
newData
, we need to pad(50+30+16+16)=112
(which isp
in ASCII) as thepad_byte
. We don't actually need to care thepad_len
here because the signature remains the same for cookie...&isLoggedIn=True
and...&isLoggedIn=Trueppppp...
. But when we use the old signature as the startingiv
to calculate the new signature, we need to extract only thenewData
part to continue the encryption.
- For the original data (SECRET + login_info), we need to pad
lj12_hash
: Rewrite the function by providing new parameters—the padded original cookieoldData
, the malicious datanewData
we want to add, and the signature from the original cookie. With that valid signature we can use it as the ultimateenc_block
to encrypt our newly added data blocks, as I introduced in the previous chapter.newData
: The newly added data should be&isLoggedIn=True
appended to the end of the cookie, which is able to bypass the check fromverify_cookie
function.cookie
: Forge the cookie with the controlled paddings, the newly added data, and the faked signature.
With the new cookie generated, edit it in the browser, we can then access the /program
page and take the flag:
Comments | NOTHING