Yixing Xu#
Yixing Xu is an algorithm engineer with 10 years experiences. Currently, he focuses on model compression/acceleration techniques.
Posts by Yixing Xu
Týr-the-Pruner: Search-based Global Structural Pruning for LLMs
This blog introduces Týr-the-Pruner, a search-based, end-to-end framework for global structural pruning of large language models (LLMs).
Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More
Gumiho boosts LLM inference with early-token accuracy, blending serial + parallel decoding for speed, accuracy, and ROCm-optimized deployment.
Technical Dive into AMD's MLPerf Inference v5.1 Submission
In this blog, we share the technical details of how we accomplish the results in our MLPerf Inference v5.1 submission.
Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance
This blog describes the technical details of how we prune and fine tune the Llama 3.1 405B model in our MLPerf Inference v5.1 submission.
Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission
In this blog, we will provide step by step instruction on how to reproduce AMD's MLPerf Inference v5.1 Submission