1 year ago
#342224
user18365860
Python get optcodes from .pyc files
I am currently implementing a Python interpreter in Rust. Basically, I want it be able to run python compiled code, in a .pyc file. I will write the compiler is a separate project.
I am having huge trouble mapping the bytecode on the .pyc file to the corresponding optcode. Here is an example:
The original file, hello_world.py
def hello():
print("Hello World")
I compile this file into .pyc by using the command:
python3 -m compileall hello_world.py
I know that I can get the optcode mapping using the dis modudle, so my idea was to just read the .pyc file as a u8 vector in rust and then map the bytes. However, here is my problem. If I dissassemble the original python file with the dis module I get this:
1 0 LOAD_CONST 0 (<code object hello at 0x7f4a42392240, file "hello_world.py", line 1>)
2 LOAD_CONST 1 ('hello')
4 MAKE_FUNCTION 0
6 STORE_NAME 0 (hello)
8 LOAD_CONST 2 (None)
10 RETURN_VALUE
Disassembly of <code object hello at 0x7f4a42392240, file "hello_world.py", line 1>:
2 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('Hello World')
4 CALL_FUNCTION 1
6 POP_TOP
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
But, if I do the hexdump, with the -C flag, on the .pyc file, I get this:
00000000 55 0d 0d 0a 00 00 00 00 bc 91 40 62 25 00 00 00 |U.........@b%...|
00000010 e3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 02 00 00 00 40 00 00 00 73 0c 00 00 00 64 00 |.....@...s....d.|
00000030 64 01 84 00 5a 00 64 02 53 00 29 03 63 00 00 00 |d...Z.d.S.).c...|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00 |................|
00000050 00 43 00 00 00 73 0c 00 00 00 74 00 64 01 83 01 |.C...s....t.d...|
00000060 01 00 64 00 53 00 29 02 4e 7a 0b 48 65 6c 6c 6f |..d.S.).Nz.Hello|
00000070 20 57 6f 72 6c 64 29 01 da 05 70 72 69 6e 74 a9 | World)...print.|
00000080 00 72 02 00 00 00 72 02 00 00 00 fa 0e 68 65 6c |.r....r......hel|
00000090 6c 6f 5f 77 6f 72 6c 64 2e 70 79 da 05 68 65 6c |lo_world.py..hel|
000000a0 6c 6f 01 00 00 00 73 02 00 00 00 00 01 72 04 00 |lo....s......r..|
000000b0 00 00 4e 29 01 72 04 00 00 00 72 02 00 00 00 72 |..N).r....r....r|
000000c0 02 00 00 00 72 02 00 00 00 72 03 00 00 00 da 08 |....r....r......|
000000d0 3c 6d 6f 64 75 6c 65 3e 01 00 00 00 f3 00 00 00 |<module>........|
000000e0 00 |.|
000000e1
the dis module disassembler tells me that this code has 12 optcode instructions, so I as expecting the .pyc file to have 12 * 2 bytes (Because it's two bytes per optcode). However, it seems that the .pyc file has 200+ bytes, and that is making me really confused.
How can I manage to get the optcodes from the .pyc file, without using the python modules?
Thank you!
python
python-3.x
virtual-machine
bytecode
0 Answers
Your Answer