SPARQ: An Accelerator Architecture for Large Language Models with Joint Sparsity and Quantization Techniques (LCTES 2025 - Languages, Compilers, Tools and Theory of Embedded Systems) - PLDI 2025

Mon 16 - Fri 20 June 2025 Seoul, South Korea

Who

Seonggyu Choi, Hyungmin Cho

Track

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Mon 16 Jun 2025 15:40 - 16:00 at Violet - AI and Accelerator Architecture + WIP Chair(s): Yongjun Park

Abstract

Large Language Models (LLMs) have demonstrated unprecedented capabilities in text generation, translation, and summarization tasks. However, their deployment on resource-constrained systems remains challenging due to their large parameter sizes and high computational demands. To address this, we propose SPARQ, a specialized accelerator architecture that leverages both sparsity and quantization to optimize LLM inference. By integrating multiply-accumulate units tailored for quantized operations and a systolic array architecture supporting N:M semi-structured sparsity, SPARQ significantly enhances area and energy efficiency with minimal impact on model quality, as demonstrated in prior work on GPTQ and SparseGPT.
Our evaluations demonstrate that SPARQ achieves up to 1.53 times greater area efficiency and 1.58 times better energy efficiency compared to the baseline, particularly for larger models.

DOI

https://doi.org/10.1145/3735452.3735523

Seonggyu Choi

Sungkyunkwan University

Hyungmin Cho

Sungkyunkwan University

South Korea

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Mon 16 Jun
Displayed time zone: Seoul change

	15:40 - 17:20	AI and Accelerator Architecture + WIPLCTES at Violet Chair(s): Yongjun Park Yonsei University

	15:40 20m Talk		SPARQ: An Accelerator Architecture for Large Language Models with Joint Sparsity and Quantization Techniques LCTES Seonggyu Choi Sungkyunkwan University, Hyungmin Cho Sungkyunkwan University DOI
	16:00 20m Talk		ADaPS: Adaptive Data Partitioning to Parallelize CNN Inference on Resource-Constrained Hardware LCTES Jaume Mateu Cuadrat Seoul National University, Bernhard Egger Seoul National University DOI
	16:20 20m Talk		Graphitron: A Domain Specific Language for FPGA-Based Graph Processing Accelerator Generation LCTES Xinmiao Zhang SKLP, Institute of Computing Technology, CAS, Zheng Feng Institute of Computing Technology at Chinese Academy of Sciences, Shengwen Liang SKLP, Institute of Computing Technology, CAS, Xinyu Chen Hong Kong University of Science and Technology, Lei Zhang ICT CAS, Cheng Liu ICT CAS DOI
	16:40 20m Talk		Modeling and Verification of Sigma Delta Neural Networks using Satisfiability Modulo Theory LCTES Sirshendu Das Indian Statistical Institute, Ansuman Banerjee Indian Statistical Institute, Swarup Kumar Mohalik Ericsson Research DOI
	17:00 10m Talk		Zoozve: A Strip-Mining-Free RISC-V Vector Extension with Arbitrary Register Grouping Compilation Support (WIP) LCTES Siyi Xu Shanghai University, Limin Jiang Shanghai University, Yintao Liu Shanghai University, Yihao Shen Shanghai University, Yi Shi Shanghai University, Shan Cao Shanghai University, Zhiyuan Jiang Shanghai University DOI
	17:10 10m Talk		Towards Macro-Aware C-to-Rust Transpilation (WIP) LCTES Robbe De Greef Vrije Universiteit Brussel, Attilio Discepoli Vrije Universiteit Brussel, Esteban Aguililla Klein Université Libre de Bruxelles, Théo Engels Royal Military Academy of Belgium, Ken Hasselmann Royal Military Academy of Belgium, Antonio Paolillo Vrije Universiteit Brussel DOI