Array Programming on GPUs: Challenges and Opportunities
This program is tentative and subject to change.
Today, the lion’s share of machine learning and high-performance computing workloads is executed on GPUs, including high-stakes applications such as self-driving cars and fusion reactor simulations. Unfortunately, GPU computations are carried out on largely undocumented hardware units that cannot trap or report floating-point exceptions. Worsening the situation is an ongoing and accelerating shift toward lower-precision arithmetic, driven by performance demands—yet this shift only exacerbates the frequency and severity of floating-point exceptions. Increasingly, matrix multiplications are offloaded to specialized hardware such as Tensor Cores. However, because these units do not adhere to a unified arithmetic standard, their computed results can deviate to unacceptable levels.
This experience report aims to consolidate our previously published work and relate it to array programming in two key ways: (1) by providing tools to diagnose bugs that may arise during array computations, and (2) by addressing broader correctness challenges inherent to array-based programming. This report highlights GPU-FPX, a debugging tool extended to analyze computations involving Tensor Cores. It addresses key correctness challenges, such as the potential for different Tensor Core implementations to produce inconsistent results for the same input. These discrepancies can be systematically uncovered using a targeted testing approach known as FTTN. We conclude with a discussion on how formal methods, particularly those based on SMT solvers, can play a critical role in identifying and bridging gaps in manufacturer-provided hardware specifications—and, in the long term, in proving desired correctness properties.
This program is tentative and subject to change.
Tue 17 JunDisplayed time zone: Seoul change
10:30 - 12:00 | |||
10:30 30mTalk | Gate Fusion is Map Fusion ARRAY | ||
11:00 30mTalk | Array Programming on GPUs: Challenges and Opportunities ARRAY Xinyi Li University of Utah, Mark Baranowski University of Utah, Harvey Dam University of Utah, Ganesh Gopalakrishnan University of Utah | ||
11:30 30mTalk | Accelerating the Static Analysis of Neural Networks by Batch Representation of Abstract Values ARRAY Guillaume Berthelot French Navy, Armed Forces Ministry, Arnault Ioualalen Numalis, Matthieu Martel Université de Perpignan Via Domitia |