More and more closed-source products and malware written in Python are distributed as compiled bytecode (.pyc) files, typically packaged with a Python runtime environment into standalone executables. Decompiling Python bytecode requires specialized tools that historically failed to keep up with Python’s annual release cycle due to high maintenance costs. Our research developed PyLingual (https://pylingual.io) [1-3], an NLP-assisted decompilation framework, hosted as a freely available public service.
This tutorial will guide participants to achieve perfect decompilation of Python bytecode by covering the following topics:
- To build an intuitive understanding of Python bytcode, we will outline Python’s stack-based execution model with hierarchical code organization.
- By briefly surveying the Python reversing ecosystem, we will highlight the unqiue challenges of Python decompilation.
- We will introduce PyLingual, an NLP-assisted Python decompiler that provides state-of-the-art decompilation accuracy for recent Python versions.
- With several hands-on examples, we will show how to leverage PyLingual’s strict accuracy verification mechanism to identify, localize, and repair semantic decompilation errors.
[1] Wiedemeier, J. et al. PyLingual: Toward Perfect Decompilation of Evolving High-Level Languages. in IEEE Symposium on Security and Privacy (SP) (2025).
[2] Wiedemeier, J. et al. PyLingual: A Python Decompilation Framework for Evolving Python Versions. in BlackHat USA (2024).
[3] Wiedemeier, J. There and Back Again: Reverse Engineering Python Binaries. in (2024).