Translating software between programming languages is a challenging task that typically requires detailed expertise in both the source and target languages, together with a thorough understanding of the intended behavior of the code. Till date, completely automated translation remains elusive due to the combination of problems central to this task — an ideal translation tool should scale to real-world software, ensure that the translated code is correct, is readable, and adheres to the idiomacy of the target programming language.
In this work, we propose a systematic approach to code translation that scales to realistic software, at least one order of magnitude larger than considered in prior work, nearly automatically. Key to such scalability is the idea of \textit{program skeletons}. A program skeleton retains the high-level structure of the program by abstracting away and effectively summarizing lower-level concrete code fragments, and further, can be mechanically translated across programming languages. Further, by design, such a skeleton permits many different ways of filling in the concrete implementation, and can thus work in conjunction with a data-driven code synthesizer to fill in idiomatic implementation of the previously abstracted, smaller fragments of the code. Most importantly, skeletons are designed to be \textit{sound}, i.e., any completion satisfying this skeleton is a correct translation. To construct sound and permissive program skeletons, we propose a shared semantic model across languages, that abstracts away language-specific low-level program behaviors. Our code fragments can then be specified in and analyzed against this common model. Our resulting skeleton allows the validation of each code fragment individually, \emph{independent of how the other parts of the skeleton will be furnished}, giving avenues for scalability. We instantiate our approach into a tool named Skel, for the task of translating software from Python to JavaScript. In our evaluation, we show that Skel can almost automatically translate software projects of up to around $2k$ lines of code to idiomatic JavaScript code.
Note to reviewers: Additional details about implementation and algorithms from the paper have been relegated to the `supplementary material’, which is also attached.