Dong Li#
Dong Li is a Director at AMD AIG-Models Team. He leads a team for AI algorithm optimization covering diverse applications (e.g., efficient LLM/VLM/image/video generation) on broad AMD devices (GPU, NPU, CPU) and customer support. Prior to joining AMD/Xilinx, he was a researcher at a Chinese AI startup for computer vision project research and landing. Dong received his Bachelor’s degree and PhD from Tsinghua University in 2012 and 2017 respectively, and served as a visiting scholar at University of California, Merced in 2015-2016. He has 10+ years AI/ML algorithm experience with 30+ publications on top-tier AI conferences (e.g., NeurIPS, CVPR). He won 1st place in fidelity track of NTIRE2022 Efficient Image Super-resolution Challenge, and Team Player Award in 2022 at AMDr.
Posts by Dong Li
Efficient and Portable 3D Explorable World Generation on AMD GPUs
Learn how to run Matrix3D world generation on AMD GPUs more smoothly and efficiently.
Low Kruskal-Rank Adaptation
Learn how Kruskal rank can enhance LoRA by replacing the conventional matrix-rank formulation for more efficient training.
Out-of-the-Box ROLL Support on AMD GPUs: Accelerating Reinforcement Learning at Scale
Learn how to run Alibaba's ROLL RL framework out-of-the-box on AMD Instinct™ GPUs with ROCm
Enabling Speculative Speculative Decoding on MI300X
This is an introduction of speculative speculative decoding method. We enable this method on the AMD Instinct MI300x GPUs and report the results.
FLy: A New Paradigm for Speculative Decoding — Accepting Semantically Correct Drafts Beyond Exact Match
This blog explores a new training-free loosely speculative decoding method, that can accept mismatches that are semantically valid and speedup original SPD method.
Micro-World: First AMD Open-Source World Models for Interactive Video Generation
Micro-World is an action-controlled interactive world model designed to generate high-quality, open-domain scenes.
Nitro-AR: A Compact AR Transformer for High-Quality Image Generation
Nitro-AR is a compact E-MMDiT-based masked AR image generator matching diffusion quality with lower latency on AMD GPUs.
Athena-PRM: Enhancing Multimodal Reasoning with Data-Efficient Process Reward Models
Learn how to utilize a data-efficient Process Reward Model to enhance the reasoning ability of the Large Language/Multimodal Models.
Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation
Explore how MXFP4/6, supported by AMD Instinct™ MI350 series GPUs, achieves BF16-comparable image and video generation quality.
SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning
In this blog we will discuss SparK, a training-free, plug-and-play method for KV cache compression in large language models (LLMs).
GEAK HIP: Expanding GEAK for HIP Code Optimization
Explore the GEAK frameworks AI-driven HIP code optimization for improved performance on AMD GPUs, including speedup examples and benefits for AI workloads.
GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs
Introducing GEAK Family - AI-driven agents that automate GPU kernel optimization for AMD Instinct GPUs with hardware-aware feedback
Accelerating Autonomous Driving Model Training on AMD ROCm™ Software
Learn how to deploy AMD GPUs for high-performance autonomous driving related model training with ROCm optimization.
Týr-the-Pruner: Search-based Global Structural Pruning for LLMs
This blog introduces Týr-the-Pruner, a search-based, end-to-end framework for global structural pruning of large language models (LLMs).
Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation
Nitro-E is an extremely lightweight diffusion transformer model for high-quality image generation with only 304M paramters.
Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More
Gumiho boosts LLM inference with early-token accuracy, blending serial + parallel decoding for speed, accuracy, and ROCm-optimized deployment.
Technical Dive into AMD's MLPerf Inference v5.1 Submission
In this blog, we share the technical details of how we accomplish the results in our MLPerf Inference v5.1 submission.
Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission
In this blog, we will provide step by step instruction on how to reproduce AMD's MLPerf Inference v5.1 Submission
Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance
This blog describes the technical details of how we prune and fine tune the Llama 3.1 405B model in our MLPerf Inference v5.1 submission.
Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning
A novel approach that replaces visual tokens with perception-conditioned weights, reducing compute while maintaining strong vision-language performance.
GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
AMD introduces GEAK, an AI agent for generating optimized Triton GPU kernels, achieving up to 63% accuracy and up to 2.59× speedups on MI300X GPUs.