Mingge Lu 卢铭阁
Logo Master Student in CS @ USTC
Logo B.Eng. in AI, School of the Gifted Young @ USTC
I am a Master student in Computer Science at University of Science and Technology of China (USTC), under the supervision of Prof. Guangzhong Sun and Dr. Jingwei Sun. I received my Bachelor degree in the Talent Program in AI (Honor) from School of the Gifted Young, USTC in 2024.
My research interests lie broadly in designing efficient machine learning systems, including model compression algorithms (sparsity, quantization, neural architecture search) and co-designed GPU kernels.

Education
  • University of Science and Technology of China
    University of Science and Technology of China
    Master Student, School of Computer Science and Technology
    Sep. 2024 - present
  • University of Science and Technology of China
    University of Science and Technology of China
    B.Eng. in Artificial Intelligence (Honor), School of the Gifted Young
    Sep. 2020 - Jul. 2024
News
2025
One paper has been accepted by NeurIPS 2025.
Sep 19
Selected Publications (view all )
Lua-LLM: Learning Unstructured-Sparsity Allocation for Large Language Models
Lua-LLM: Learning Unstructured-Sparsity Allocation for Large Language Models

Mingge Lu, Jingwei Sun, Junqing Lin, Zechun Zhou, Guangzhong Sun

Advances in Neural Information Processing Systems (NeurIPS) 2025

We propose Lua-LLM (Learning unstructured-sparsity allocation in LLMs), a learning-based global pruning framework that explores the optimal unstructured sparsity allocation. Unlike existing pruning methods, which primarily focus on allocating per-layer sparsity, Lua-LLM achieves flexible allocation for both layer-wise and intra-layer sparsity.

Lua-LLM: Learning Unstructured-Sparsity Allocation for Large Language Models

Mingge Lu, Jingwei Sun, Junqing Lin, Zechun Zhou, Guangzhong Sun

Advances in Neural Information Processing Systems (NeurIPS) 2025

We propose Lua-LLM (Learning unstructured-sparsity allocation in LLMs), a learning-based global pruning framework that explores the optimal unstructured sparsity allocation. Unlike existing pruning methods, which primarily focus on allocating per-layer sparsity, Lua-LLM achieves flexible allocation for both layer-wise and intra-layer sparsity.

Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage
Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage

Junqing Lin, Jingwei Sun, Mingge Lu, Guangzhong Sun

arXiv:2507.12205

This paper presents EC-SpMV, a GPU-optimized SpMV approach for accelerating sparse LLM inference. EC-SpMV introduces (1) a hierarchical block extraction algorithm that captures multiple granularities of block structures within sparse LLMs, and (2) a novel compressed sparse format (EC-CSR) that employs delta indexing to reduce storage overhead and enhance memory access efficiency.

Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage

Junqing Lin, Jingwei Sun, Mingge Lu, Guangzhong Sun

arXiv:2507.12205

This paper presents EC-SpMV, a GPU-optimized SpMV approach for accelerating sparse LLM inference. EC-SpMV introduces (1) a hierarchical block extraction algorithm that captures multiple granularities of block structures within sparse LLMs, and (2) a novel compressed sparse format (EC-CSR) that employs delta indexing to reduce storage overhead and enhance memory access efficiency.

All publications