Scalable, Validated Code Translation of Entire Projects using Large Language Models (PLDI 2025 - PLDI Research Papers)

Who

Hanliang Zhang, Cristina David, Meng Wang, Brandon Paulsen, Daniel Kroening

Track

PLDI 2025 PLDI Research Papers

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 18 Jun 2025 16:40 - 17:00 at Cosmos, Violet & Tulip - Machine Learning Chair(s): Feras Saad

Abstract

Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code. However, a significant limitation when using LLMs for code translation is scalability: existing works have shown a drop in translation success rates for code exceeding around 100 lines. We overcome this limitation by developing a modular approach to translation, where we partition the code into small code fragments which can be translated independently and semantically validated (that is, by checking I/O equivalence). When this approach is applied naively, we discover that LLMs are unreliable when translating features of the source language that do not have a direct mapping to the target language, and that the LLM often gets stuck in repair loops when attempting to fix errors. To address these issues, we introduce two key concepts: (1) \emph{feature mapping}, which integrates predefined translation rules with LLM-based translation to guide the LLM in navigating subtle language differences and producing semantically accurate code; and (2) \emph{type-compatibility}, which facilitates localized checks at the function signature level to detect errors early, thereby narrowing the scope of potential repairs. We apply our approach to translating real-world Go codebases to Rust, demonstrating that we can consistently generate reliable Rust translations for projects up to \revision{9,700} lines of code and \revision{780} functions, with an average of 73% of functions successfully validated for I/O equivalence, considerably higher than any existing work. \revision{An artifact for our work can be found at: \url{https://zenodo.org/records/15242640}.}

DOI

https://doi.org/10.1145/3729315

Hanliang Zhang

University of Bristol

United Kingdom

Cristina David

University of Bristol

United Kingdom

Meng Wang

University of Bristol

United Kingdom

Brandon Paulsen

Amazon

United States

Daniel Kroening

Amazon

United States

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 18 Jun
Displayed time zone: Seoul change

16:00 - 17:20	Machine LearningPLDI Research Papers at Cosmos, Violet & Tulip Chair(s): Feras Saad Carnegie Mellon University

16:00 20m Talk		Type-Constrained Code Generation with Language Models PLDI Research Papers Niels Mündler ETH Zurich, Jingxuan He University of California at Berkeley, Hao Wang University of California at Berkeley, Koushik Sen University of California at Berkeley, Dawn Song University of California at Berkeley, Martin Vechev ETH Zurich DOI Pre-print
16:20 20m Talk		Reductive Analysis with Compiler-Guided Large Language Models for Input-Centric Code OptimizationsRecorded PLDI Research Papers Xiangwei Wang North Carolina State University, Xinning Hui North Carolina State University, Chunhua Liao Lawrence Livermore National Laboratory, Xipeng Shen North Carolina State University DOI
16:40 20m Talk		Scalable, Validated Code Translation of Entire Projects using Large Language Models PLDI Research Papers Hanliang Zhang University of Bristol, Cristina David University of Bristol, Meng Wang University of Bristol, Brandon Paulsen Amazon, Daniel Kroening Amazon DOI
17:00 20m Talk		Guided Tensor Lifting PLDI Research Papers Yixuan Li University of Edinburgh, José Wesley De Souza Magalhães University of Edinburgh, Alexander Brauckmann University of Edinburgh, Michael F. P. O'Boyle University of Edinburgh, Elizabeth Polgreen University of Edinburgh DOI