PLDI 2025
Mon 16 - Fri 20 June 2025 Seoul, South Korea
Mon 16 Jun 2025 16:20 - 16:40 at Cosmos - Session 4 Chair(s): Kazem Cheshmi

As Artificial Intelligence (AI) algorithms become increasingly complex, sparse computing plays a crucial role in their evolution. This is because sparsity is an important method for compressing neural network models and reducing computational workload. Furthermore, generative algorithms like Large Language Models (LLMs) are ushering AI into the 2.0 era, and the immense computational complexity of LLMs makes it even more critical to use sparsity to reduce the workload. This report, centered on sparse computing and hardware-software co-design, presents a series of works from both system-level and hardware-level perspectives, targeting applications such as sparse Graph Neural Networks (GNNs), LLMs, and multimodal large models. At the system level, we first introduce sparse kernel optimization strategies on GPU systems and propose an open-source sparse kernel library, dgSPARSE. The dgSPARSE library outperforms commercial libraries across various GNN models and sparse operators. At the hardware level, we implemented an efficient large model inference solution, FlightLLM, on FPGAs, achieving a 6.0x improvement in energy efficiency compared to V100S GPUs of equivalent process technology. For video generation tasks, by leveraging the unique inter-frame and intra-frame correlations in videos, we propose an architecture based on spatial-temporal sparsification and mixed-precision computation. Implemented on an AMD V80 FPGA, this achieves a 1.3x speedup compared to a GPU implementation, despite the GPU possessing over 21 times the peak computational power.

Mon 16 Jun

Displayed time zone: Seoul change