1 year ago

#342224

test-img

user18365860

Python get optcodes from .pyc files

I am currently implementing a Python interpreter in Rust. Basically, I want it be able to run python compiled code, in a .pyc file. I will write the compiler is a separate project.

I am having huge trouble mapping the bytecode on the .pyc file to the corresponding optcode. Here is an example:

The original file, hello_world.py

def hello():
    print("Hello World")

I compile this file into .pyc by using the command:

python3 -m compileall hello_world.py

I know that I can get the optcode mapping using the dis modudle, so my idea was to just read the .pyc file as a u8 vector in rust and then map the bytes. However, here is my problem. If I dissassemble the original python file with the dis module I get this:

  1           0 LOAD_CONST               0 (<code object hello at 0x7f4a42392240, file "hello_world.py", line 1>)
              2 LOAD_CONST               1 ('hello')
              4 MAKE_FUNCTION            0
              6 STORE_NAME               0 (hello)
              8 LOAD_CONST               2 (None)
             10 RETURN_VALUE

Disassembly of <code object hello at 0x7f4a42392240, file "hello_world.py", line 1>:
  2           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('Hello World')
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

But, if I do the hexdump, with the -C flag, on the .pyc file, I get this:

00000000  55 0d 0d 0a 00 00 00 00  bc 91 40 62 25 00 00 00  |U.........@b%...|
00000010  e3 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 02 00 00 00 40 00 00  00 73 0c 00 00 00 64 00  |.....@...s....d.|
00000030  64 01 84 00 5a 00 64 02  53 00 29 03 63 00 00 00  |d...Z.d.S.).c...|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 02 00 00  |................|
00000050  00 43 00 00 00 73 0c 00  00 00 74 00 64 01 83 01  |.C...s....t.d...|
00000060  01 00 64 00 53 00 29 02  4e 7a 0b 48 65 6c 6c 6f  |..d.S.).Nz.Hello|
00000070  20 57 6f 72 6c 64 29 01  da 05 70 72 69 6e 74 a9  | World)...print.|
00000080  00 72 02 00 00 00 72 02  00 00 00 fa 0e 68 65 6c  |.r....r......hel|
00000090  6c 6f 5f 77 6f 72 6c 64  2e 70 79 da 05 68 65 6c  |lo_world.py..hel|
000000a0  6c 6f 01 00 00 00 73 02  00 00 00 00 01 72 04 00  |lo....s......r..|
000000b0  00 00 4e 29 01 72 04 00  00 00 72 02 00 00 00 72  |..N).r....r....r|
000000c0  02 00 00 00 72 02 00 00  00 72 03 00 00 00 da 08  |....r....r......|
000000d0  3c 6d 6f 64 75 6c 65 3e  01 00 00 00 f3 00 00 00  |<module>........|
000000e0  00                                                |.|
000000e1

the dis module disassembler tells me that this code has 12 optcode instructions, so I as expecting the .pyc file to have 12 * 2 bytes (Because it's two bytes per optcode). However, it seems that the .pyc file has 200+ bytes, and that is making me really confused.

How can I manage to get the optcodes from the .pyc file, without using the python modules?

Thank you!

python

python-3.x

virtual-machine

bytecode

0 Answers

Your Answer

Accepted video resources