PLDI 2025
Mon 16 - Fri 20 June 2025 Seoul, South Korea
Wed 18 Jun 2025 16:00 - 16:20 at Orchid - High Performance Computing Chair(s): Charith Mendis

Domain-specific, fixed-function units are becoming increasingly common in modern processors. As the computational demands of applications evolve, the capabilities and programming interfaces of these fixed-function units continue to change. NVIDIA's Hopper GPU architecture contains multiple fixed-function units per compute unit, including an asynchronous data movement unit (TMA) and an asynchronous matrix multiplication unit (Tensor Core). Efficiently utilizing these units requires a fundamentally different programming style than previous architectures; programmers must now develop warp-specialized kernels that orchestrate producer-consumer pipelines between the asynchronous units. To manage the complexity of programming these new architectures, we introduce Cypress, a task-based programming model with sequential semantics. Cypress programs are a set of designated functions called \emph{tasks} that operate on \emph{tensors} and are free of communication and synchronization. Cypress programs are bound to the target machine through a \emph{mapping} specification that describes where tasks should run and in which memories tensors should be materialized. We present a compiler architecture that lowers Cypress programs into CUDA programs that perform competitively with expert-written codes. Cypress achieves 0.88x-1.06x the performance of cuBLAS on GEMM, and between 0.80x-0.98x the performance of the currently best-known Flash Attention implementation while eliminating all aspects of explicit data movement and asynchronous computation from application code.

Wed 18 Jun

Displayed time zone: Seoul change

16:00 - 17:20
High Performance ComputingPLDI Research Papers at Orchid
Chair(s): Charith Mendis University of Illinois at Urbana-Champaign
16:00
20m
Talk
Task-Based Tensor Computations on Modern GPUs
PLDI Research Papers
Rohan Yadav Stanford University, Michael Garland NVIDIA, Alex Aiken Stanford University, Michael Bauer NVIDIA
DOI
16:20
20m
Talk
Lightweight and Locality-Aware Composition of Black-Box Subroutines
PLDI Research Papers
Manya Bansal Massachusetts Institute of Technology, Dillon Sharlet Google, Jonathan Ragan-Kelley Massachusetts Institute of Technology, Saman Amarasinghe Massachusetts Institute of Technology
DOI
16:40
20m
Talk
Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs
PLDI Research Papers
Alonso Rodríguez-Iglesias Universidade da Coruña, Santoshkumar T. Tongli Colorado State University, Emily Tucker Colorado State University, Louis-Noël Pouchet Colorado State University, Gabriel Rodríguez Universidade da Coruña, Juan Tourino Universidade da Coruña
DOI
17:00
20m
Talk
Dynamic Robustness Verification against Weak MemoryRemote
PLDI Research Papers
Roy Margalit Tel Aviv University, Michalis Kokologiannakis ETH Zurich, Shachar Itzhaky Technion, Ori Lahav Tel Aviv University
DOI
:
:
:
: