Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums
This program is tentative and subject to change.
Programming high-performance sparse GPU kernels is notoriously difficult, often requiring significant effort and deep expertise. Existing sparse tensor compilers primarily target CPUs and focus on primitives such as intersection and union, often overlooking performance-critical dense regions and failing to optimize sparse GPU workloads that involve substantial dense computation. In this talk, we will propose a new approach for expressing sparse computations by directly encoding format information into indirect Einsums. To support this abstraction, we introduce the Insum compiler, which generates efficient GPU code for indirect Einsums by lowering to the PyTorch compiler—extended to better support Tensor Core–enabled sparse workloads. We also present two fixed-length sparse formats, GroupCOO and BlockGroupCOO, designed to fit naturally within our indirect Einsum framework. Our approach achieves 0.94×–3.81× speedups across a range of sparse GPU applications while reducing lines of code by 92×–4491× compared to hand-written implementations.
Prof. Saman Amarasinghe leads the Commit compiler research group in MIT’s Computer Science & Artificial Intelligence Laboratory (CSAIL), which focuses on programming languages and compilers that maximize application performance on modern computing platforms. He is a world leader in the field of high-performance domain-specific languages. Prof. Amarasinghe’s group developed the Halide, TACO, Simit, StreamIt, StreamJIT, PetaBricks, MILK, Cimple, and GraphIt domain-specific languages and compilers, all of which combine language design and sophisticated compilation techniques to deliver unprecedented performance for targeted application domains such as image processing, stream computations, and graph analytics. Dr. Amarasinghe also pioneered the application of machine learning for compiler optimizations, from Meta optimization in 2003 to OpenTuner extendable autotuner today. With professor Anant Agarwal, he co-led the Raw architecture project, which did pioneering work on scalable multicores. Prof. Amarasinghe’s entrepreneurship activities include founding Determina, Inc. (acquired by VMWare) based on computer security research pioneered in his research group at MIT and co-founding Lanka Internet Services, Ltd., the first Internet Service Provider in Sri Lanka. Prof. Amarasinghe is also the faculty director of MIT Global Startup Labs, whose summer programs in 17 countries have helped to create more than 20 thriving startups. Prof. Amarasinghe developed the popular Performance Engineering of Software Systems (6.172) class with Professor Charles Leiserson. He also created individualized software project classes such as the Open Source Software Project Lab, the Open Source Entrepreneurship Lab, and the Bring Your Own Software Project Lab.
This program is tentative and subject to change.
Mon 16 JunDisplayed time zone: Seoul change
09:00 - 10:10 | |||
09:00 20mTalk | Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums Sparse Saman Amarasinghe Massachusetts Institute of Technology | ||
09:20 20mTalk | Intelligent Auto-Tuning for High-Performance Sparse Tensor Algebra Sparse Jiajia Li North Carolina State University | ||
09:40 20mTalk | Loop Fusion in Matrix Multiplications with Sparse Dependence Sparse Kazem Cheshmi McMaster University | ||
10:00 10mTalk | Panel 1 Sparse Saman Amarasinghe Massachusetts Institute of Technology, Kazem Cheshmi McMaster University, Jiajia Li North Carolina State University |