Larry Li

Larry Li#

Larry Li is a Senior Software Development Engineer on the Quark team within AMD’s AIG-AIS organization. His work focuses on accelerating LLM inference on AMD Instinct™ MI series GPUs, with a particular emphasis on speculative decoding techniques such as EAGLE3. He also contributes to AMD Quark’s ONNX quantization initiatives and supports key customers in adopting and optimizing these models and techniques on AMD hardware. His research interests include inference acceleration, model optimization and quantization, and natural language processing. Larry holds a bachelor’s degree in Electronic Engineering from Harbin Institute of Technology and a master’s degree in Computer Science from the Institute of Automation, Chinese Academy of Sciences.

Posts by Larry Li

July 03, 2026

Accelerating Large-Scale LLM Inference on AMD Instinct MI350X/MI355X with Eagle3 and AMD Quark

Learn how the AMD Quark team enables Eagle3 speculative decoding for Kimi-K2.5 and MiniMax-M2.5 on AMD Instinct MI355X GPUs with ROCm, vLLM, and InferenceX.

https://rocm.blogs.amd.com/artificial-intelligence/eagle3-speculative-decoding/README.html