Yixing Xu#
Yixing Xu is an algorithm engineer with 10 years experiences. Currently, he focuses on model compression/acceleration techniques.
Posts by Yixing Xu

Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More
Gumiho boosts LLM inference with early-token accuracy, blending serial + parallel decoding for speed, accuracy, and ROCm-optimized deployment.

Technical Dive into AMD's MLPerf Inference v5.1 Submission
In this blog, we share the technical details of how we accomplish the results in our MLPerf Inference v5.1 submission.
September 09, 2025 by Meena Arunachalam, Miro Hodak, Poovaiah Palangappa, Wei-Ting Liao, Uma Kannikanti, Fulu Li, Neha Mathews, Rajesh Poornachandran, Ean Garvey, Kumar Deepak, Yixing Xu, Zhe Li, Guanchen Li, Xuanwu Yin, Dong Li, Zhao Lin, Wei Luo, Bowen Bao, Spandan Tiwari, Niels Zhang, Vinayak Gokhale, Clint Greene, Eliot Li

Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance
This blog describes the technical details of how we prune and fine tune the Llama 3.1 405B model in our MLPerf Inference v5.1 submission.
September 09, 2025 by Meena Arunachalam, Miro Hodak, Poovaiah Palangappa, Fulu Li, Yixing Xu, Zhe Li, Guanchen Li, Xuanwu Yin, Dong Li, Karan Verma, Clint Greene, Eliot Li

Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission
In this blog, we will provide step by step instruction on how to reproduce AMD's MLPerf Inference v5.1 Submission
September 09, 2025 by Meena Arunachalam, Miro Hodak, Poovaiah Palangappa, Wei-Ting Liao, Uma Kannikanti, Fulu Li, Karan Verma, Neha Mathews, Yamini Kamisetty, Chelsea Iluno, Ean Garvey, Kumar Deepak, Yixing Xu, Zhe Li, Guanchen Li, Xuanwu Yin, Dong Li, Clint Greene, Eliot Li