Loop Fusion in Matrix Multiplications with Sparse Dependence
This program is tentative and subject to change.
Fusing parallel loops in consecutive matrix multiplications presents an opportunity for data locality optimization. However, irregular dependencies between iterations across the loops hinder existing compilers from performing this fusion. It also poses challenges for runtime methods, leading to excessive synchronization overhead or limited data reuse. This paper introduces tile fusion, a compiler approach that fuses tiles from the two parallel loops of matrix multiplications with sparse dependence between them. By enhancing data locality and providing balanced workloads, tile fusion accelerates graph neural network training and the solution of sparse linear systems, achieving geometric mean speedups of 2.33× over PyG and 1.32× over MKL, respectively.
This program is tentative and subject to change.
Mon 16 JunDisplayed time zone: Seoul change
09:00 - 10:10 | |||
09:00 20mTalk | Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums Sparse Saman Amarasinghe Massachusetts Institute of Technology | ||
09:20 20mTalk | Intelligent Auto-Tuning for High-Performance Sparse Tensor Algebra Sparse Jiajia Li North Carolina State University | ||
09:40 20mTalk | Loop Fusion in Matrix Multiplications with Sparse Dependence Sparse Kazem Cheshmi McMaster University | ||
10:00 10mTalk | Panel 1 Sparse Saman Amarasinghe Massachusetts Institute of Technology, Kazem Cheshmi McMaster University, Jiajia Li North Carolina State University |