Dong Li

Dong Li#

Dong Li is a Director at AMD AIG-Models Team. He leads a team for AI algorithm optimization covering diverse applications (e.g., efficient LLM/VLM/image/video generation) on broad AMD devices (GPU, NPU, CPU) and customer support. Prior to joining AMD/Xilinx, he was a researcher at a Chinese AI startup for computer vision project research and landing. Dong received his Bachelor’s degree and PhD from Tsinghua University in 2012 and 2017 respectively, and served as a visiting scholar at University of California, Merced in 2015-2016. He has 10+ years AI/ML algorithm experience with 30+ publications on top-tier AI conferences (e.g., NeurIPS, CVPR). He won 1st place in fidelity track of NTIRE2022 Efficient Image Super-resolution Challenge, and Team Player Award in 2022 at AMDr.

Posts by Dong Li

July 13, 2026

GEAK Agent-Driven Optimization of the DeepSeekV4 MLA Kernel

GEAK Agent accelerates DeepSeekV4 MLA kernel optimization with Triton and delivers SGLang E2E gains on AMD GPUs.

https://rocm.blogs.amd.com/software-tools-optimization/geak-mla-optimization/README.html

July 13, 2026

Triton-Based Optimization of Video Sparse Attention on ROCm

Optimize video sparse attention on ROCm with GEAK and linear global context for faster, more stable video generation on AMD GPUs.

https://rocm.blogs.amd.com/artificial-intelligence/rocm-vsa/README.html

June 18, 2026

Efficient and Portable 3D Explorable World Generation on AMD GPUs

Learn how to run Matrix3D world generation on AMD GPUs more smoothly and efficiently.

https://rocm.blogs.amd.com/artificial-intelligence/dworld-m3d/README.html

June 11, 2026

Low Kruskal-Rank Adaptation

Learn how Kruskal rank can enhance LoRA by replacing the conventional matrix-rank formulation for more efficient training.

https://rocm.blogs.amd.com/artificial-intelligence/lokra/README.html

June 01, 2026

Out-of-the-Box ROLL Support on AMD GPUs: Accelerating Reinforcement Learning at Scale

Learn how to run Alibaba's ROLL RL framework out-of-the-box on AMD Instinct™ GPUs with ROCm

https://rocm.blogs.amd.com/artificial-intelligence/roll-large-scale/README.html

May 29, 2026

Enabling Speculative Speculative Decoding on MI300X

This is an introduction of speculative speculative decoding method. We enable this method on the AMD Instinct MI300x GPUs and report the results.

https://rocm.blogs.amd.com/artificial-intelligence/ssd_mi300x/README.html

April 20, 2026

FLy: A New Paradigm for Speculative Decoding — Accepting Semantically Correct Drafts Beyond Exact Match

This blog explores a new training-free loosely speculative decoding method, that can accept mismatches that are semantically valid and speedup original SPD method.

https://rocm.blogs.amd.com/artificial-intelligence/fly/README.html

February 05, 2026

Micro-World: First AMD Open-Source World Models for Interactive Video Generation

Micro-World is an action-controlled interactive world model designed to generate high-quality, open-domain scenes.

https://rocm.blogs.amd.com/artificial-intelligence/micro-world/README.html

January 22, 2026

Nitro-AR: A Compact AR Transformer for High-Quality Image Generation

Nitro-AR is a compact E-MMDiT-based masked AR image generator matching diffusion quality with lower latency on AMD GPUs.

https://rocm.blogs.amd.com/artificial-intelligence/nitro-ar/README.html

January 12, 2026

Athena-PRM: Enhancing Multimodal Reasoning with Data-Efficient Process Reward Models

Learn how to utilize a data-efficient Process Reward Model to enhance the reasoning ability of the Large Language/Multimodal Models.

https://rocm.blogs.amd.com/artificial-intelligence/amd-elvm/README.html

January 07, 2026

Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation

Explore how MXFP4/6, supported by AMD Instinct™ MI350 series GPUs, achieves BF16-comparable image and video generation quality.

https://rocm.blogs.amd.com/artificial-intelligence/mxfp-t2i-t2v/README.html

January 02, 2026

SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

In this blog we will discuss SparK, a training-free, plug-and-play method for KV cache compression in large language models (LLMs).

https://rocm.blogs.amd.com/artificial-intelligence/spark-blog/README.html

December 23, 2025

GEAK HIP: Expanding GEAK for HIP Code Optimization

Explore the GEAK frameworks AI-driven HIP code optimization for improved performance on AMD GPUs, including speedup examples and benefits for AI workloads.

https://rocm.blogs.amd.com/software-tools-optimization/geak-hip-optimizations/README.html

December 23, 2025

GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

Introducing GEAK Family - AI-driven agents that automate GPU kernel optimization for AMD Instinct GPUs with hardware-aware feedback

https://rocm.blogs.amd.com/artificial-intelligence/geak-agents-family/README.html

December 08, 2025

Accelerating Autonomous Driving Model Training on AMD ROCm™ Software

Learn how to deploy AMD GPUs for high-performance autonomous driving related model training with ROCm optimization.

https://rocm.blogs.amd.com/artificial-intelligence/autonomous-driving/README.html

December 03, 2025

Týr-the-Pruner: Search-based Global Structural Pruning for LLMs

This blog introduces Týr-the-Pruner, a search-based, end-to-end framework for global structural pruning of large language models (LLMs).

https://rocm.blogs.amd.com/artificial-intelligence/tyr-the-pruner/README.html

October 24, 2025

Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation

Nitro-E is an extremely lightweight diffusion transformer model for high-quality image generation with only 304M paramters.

https://rocm.blogs.amd.com/artificial-intelligence/nitro-e/README.html

October 14, 2025

Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More

Gumiho boosts LLM inference with early-token accuracy, blending serial + parallel decoding for speed, accuracy, and ROCm-optimized deployment.

https://rocm.blogs.amd.com/software-tools-optimization/gumiho/README.html

September 09, 2025

Technical Dive into AMD's MLPerf Inference v5.1 Submission

In this blog, we share the technical details of how we accomplish the results in our MLPerf Inference v5.1 submission.

https://rocm.blogs.amd.com/artificial-intelligence/mlperf-inference-v5.1/README.html

September 09, 2025

Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

In this blog, we will provide step by step instruction on how to reproduce AMD's MLPerf Inference v5.1 Submission

https://rocm.blogs.amd.com/artificial-intelligence/mlperf-inference5.1-repro/README.html

September 09, 2025

Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

This blog describes the technical details of how we prune and fine tune the Llama 3.1 405B model in our MLPerf Inference v5.1 submission.

https://rocm.blogs.amd.com/artificial-intelligence/mlperf-llama-pruning/README.html

August 22, 2025

Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning

A novel approach that replaces visual tokens with perception-conditioned weights, reducing compute while maintaining strong vision-language performance.

https://rocm.blogs.amd.com/artificial-intelligence/elvm,-vlms,-llm,/README.html

August 01, 2025

GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

AMD introduces GEAK, an AI agent for generating optimized Triton GPU kernels, achieving up to 63% accuracy and up to 2.59× speedups on MI300X GPUs.

https://rocm.blogs.amd.com/software-tools-optimization/triton-kernel-ai/README.html