PLDI 2025
Mon 16 - Fri 20 June 2025 Seoul, South Korea

Translating legacy C codebases to Rust is crucial for improving memory safety, yet automating this process remains challenging for real-world projects. In this paper, we document key technical challenges we encountered in manually translating the Gzip GNU package using state-of-the-art LLMs (e.g., the OpenAI o1 model). To address the incompatibility between C and Rust type systems, we propose an optimized method aimed at providing the appropriate context and prompts to ensure correct variable types at declaration stage. We extract essential code semantics by generating data flow graphs (DFG). For global variables that may induce multiple borrowing, we prompt LLM to generate enums, and for pointer variables (scalar pointers, array pointers and pointer arithmetic), we prompt LLM to map them to correct Rust types. By leveraging DFG context information and prompts with clear instructions, our approach enables o1 to achieve more accurate type translation compared to direct translation. Our results are publicly available and demonstrate the effectiveness of our approach in improving type migration.