Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums (Sparse 2025)

Track

Sparse 2025

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 16 Jun 2025 09:00 - 09:20 at Cosmos - Session 1 Chair(s): Amir Shaikhha

Abstract

Programming high-performance sparse GPU kernels is notoriously difficult, often requiring significant effort and deep expertise. Existing sparse tensor compilers primarily target CPUs and focus on primitives such as intersection and union, often overlooking performance-critical dense regions and failing to optimize sparse GPU workloads that involve substantial dense computation. In this talk, we will propose a new approach for expressing sparse computations by directly encoding format information into indirect Einsums. To support this abstraction, we introduce the Insum compiler, which generates efficient GPU code for indirect Einsums by lowering to the PyTorch compiler—extended to better support Tensor Core–enabled sparse workloads. We also present two fixed-length sparse formats, GroupCOO and BlockGroupCOO, designed to fit naturally within our indirect Einsum framework. Our approach achieves 0.94×–3.81× speedups across a range of sparse GPU applications while reducing lines of code by 92×–4491× compared to hand-written implementations.

Bio

Prof. Saman Amarasinghe leads the Commit compiler research group in MIT’s Computer Science & Artificial Intelligence Laboratory (CSAIL), which focuses on programming languages and compilers that maximize application performance on modern computing platforms. He is a world leader in the field of high-performance domain-specific languages. Prof. Amarasinghe’s group developed the Halide, TACO, Simit, StreamIt, StreamJIT, PetaBricks, MILK, Cimple, and GraphIt domain-specific languages and compilers, all of which combine language design and sophisticated compilation techniques to deliver unprecedented performance for targeted application domains such as image processing, stream computations, and graph analytics. Dr. Amarasinghe also pioneered the application of machine learning for compiler optimizations, from Meta optimization in 2003 to OpenTuner extendable autotuner today. With professor Anant Agarwal, he co-led the Raw architecture project, which did pioneering work on scalable multicores. Prof. Amarasinghe’s entrepreneurship activities include founding Determina, Inc. (acquired by VMWare) based on computer security research pioneered in his research group at MIT and co-founding Lanka Internet Services, Ltd., the first Internet Service Provider in Sri Lanka. Prof. Amarasinghe is also the faculty director of MIT Global Startup Labs, whose summer programs in 17 countries have helped to create more than 20 thriving startups. Prof. Amarasinghe developed the popular Performance Engineering of Software Systems (6.172) class with Professor Charles Leiserson. He also created individualized software project classes such as the Open Source Software Project Lab, the Open Source Entrepreneurship Lab, and the Bring Your Own Software Project Lab.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 16 Jun
Displayed time zone: Seoul change

09:00 - 10:10	Session 1Sparse at Cosmos Chair(s): Amir Shaikhha University of Edinburgh

09:00 20m Talk		Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums Sparse Saman Amarasinghe Massachusetts Institute of Technology
09:20 20m Talk		Intelligent Auto-Tuning for High-Performance Sparse Tensor Algebra Sparse Jiajia Li North Carolina State University
09:40 20m Talk		Loop Fusion in Matrix Multiplications with Sparse Dependence Sparse Kazem Cheshmi McMaster University
10:00 10m Talk		Panel 1 Sparse Saman Amarasinghe Massachusetts Institute of Technology, Kazem Cheshmi McMaster University, Jiajia Li North Carolina State University

Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums

Mon 16 Jun
Displayed time zone: Seoul change

Saman Amarasinghe

Massachusetts Institute of Technology

Tracks

Co-hosted Conferences

Workshops

Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums

Program Display Configuration

Program Display Configuration

Mon 16 JunDisplayed time zone: Seoul change

Saman Amarasinghe

Massachusetts Institute of Technology

Mon 16 Jun
Displayed time zone: Seoul change