Array Programming on GPUs: Challenges and Opportunities (ARRAY 2025)

Mon 16 - Fri 20 June 2025 Seoul, South Korea

Who

Xinyi Li, Mark Baranowski, Harvey Dam, Ganesh Gopalakrishnan

Track

ARRAY 2025

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 17 Jun 2025 11:00 - 11:30 at Violet - Performance Challenges and Opportunities

Abstract

Today, the lion’s share of machine learning and high-performance computing workloads is executed on GPUs, including high-stakes applications such as self-driving cars and fusion reactor simulations. Unfortunately, GPU computations are carried out on largely undocumented hardware units that cannot trap or report floating-point exceptions. Worsening the situation is an ongoing and accelerating shift toward lower-precision arithmetic, driven by performance demands—yet this shift only exacerbates the frequency and severity of floating-point exceptions. Increasingly, matrix multiplications are offloaded to specialized hardware such as Tensor Cores. However, because these units do not adhere to a unified arithmetic standard, their computed results can deviate to unacceptable levels.

This experience report aims to consolidate our previously published work and relate it to array programming in two key ways: (1) by providing tools to diagnose bugs that may arise during array computations, and (2) by addressing broader correctness challenges inherent to array-based programming. This report highlights GPU-FPX, a debugging tool extended to analyze computations involving Tensor Cores. It addresses key correctness challenges, such as the potential for different Tensor Core implementations to produce inconsistent results for the same input. These discrepancies can be systematically uncovered using a targeted testing approach known as FTTN. We conclude with a discussion on how formal methods, particularly those based on SMT solvers, can play a critical role in identifying and bridging gaps in manufacturer-provided hardware specifications—and, in the long term, in proving desired correctness properties.

Xinyi Li

University of Utah

United States

Mark Baranowski

University of Utah

Harvey Dam

University of Utah

Ganesh Gopalakrishnan

University of Utah

United States

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 17 Jun
Displayed time zone: Seoul change

10:30 - 12:00	Performance Challenges and OpportunitiesARRAY at Violet

10:30 30m Talk		Gate Fusion is Map Fusion ARRAY Martin Elsman University of Copenhagen, Troels Henriksen University of Copenhagen
11:00 30m Talk		Array Programming on GPUs: Challenges and Opportunities ARRAY Xinyi Li University of Utah, Mark Baranowski University of Utah, Harvey Dam University of Utah, Ganesh Gopalakrishnan University of Utah
11:30 30m Talk		Accelerating the Static Analysis of Neural Networks by Batch Representation of Abstract Values ARRAY Guillaume Berthelot French Navy, Armed Forces Ministry, Arnault Ioualalen Numalis, Matthieu Martel Université de Perpignan Via Domitia