Yao Liu

Yao Liu#

Yao Liu is a senior software development manager at AMD. He focuses on building high-performance software solutions and is deeply passionate about artificial intelligence, open-source software, and the evolving machine learning ecosystem.

Posts by Yao Liu

Accelerating llama.cpp on AMD Instinct MI300X

Learn more about the superior performance of llama.cpp on Instinct platforms.

December 11, 2025 by Pei Zhang, Deepan Sekar, Eliot Li, Yao Liu, Phani Vaddadi, Vish Vadlamani

DGL in Depth: SE(3)-Transformer on ROCm 7

Inform the AI community about running SE(3)-Transformer with DGL on AMD Instinct platforms.

December 05, 2025 by Anuya Welling, James E. T. Smith, Geoffrey C. Martin-Noble, Tres Popp, Eliot Li, Mukhil Azhagan Mallaiyan Sathiaseelan, Yao Liu, Phani Vaddadi, Vish Vadlamani

Modernizing Taichi Lang to LLVM 20 for MI355X GPU Acceleration

Power your next AI application or graphics simulation with high-performance GPU/CPU computing in Python with Taichi Lang.

December 04, 2025 by Tiffany Mintz, Yao Liu, Phani Vaddadi, Vish Vadlamani

From Ingestion to Inference: RAG Pipelines on AMD GPUs

Build a RAG enhanced GenAI application that improves the quality of model responses by incorporating data that is missing in the model training data.

October 02, 2025 by Lin Sun, Anuya Welling, Fabricio Flores, Eliot Li, Yao Liu, Phani Vaddadi, Vish Vadlamani

Enabling FlashInfer on ROCm for Accelerated LLM Serving

FlashInfer is an open-source library for accelerating LLM serving that is now supported by ROCm.

October 01, 2025 by Rishi Madduri, Diptorup Deb, Debasis Mandal, Clint Greene, Mukhil Azhagan Mallaiyan Sathiaseelan, Yao Liu, Phani Vaddadi, Vish Vadlamani

Coding Agents on AMD GPUs: Fast LLM Pipelines for Developers

Accelerate AI-assisted coding with agentic workflows on AMD GPUs. Deploy DeepSeek-V3.1 via SGLang, vLLM, or llama.cpp to power fast, scalable coding agents

September 30, 2025 by Lin Sun, Yao Liu, Phani Vaddadi, Vish Vadlamani

Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows

Ray, combined with ROCm, provides a powerful platform for scaling AI applications, particularly for training and inference workloads.

September 10, 2025 by Vicky Tsang, Yao Liu, Phani Vaddadi, Vish Vadlamani

Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

performance optimizations for llama.cpp on AMD Instinct GPUs

September 09, 2025 by Deepan Sekar, Pei Zhang, Eliot Li, Yao Liu, Phani Vaddadi, Vish Vadlamani

DGL in the Real World: Running GNNs on Real Use Cases

We walk through four advanced GNN workloads from heterogeneous e-commerce graphs to neuroscience applications that we successfully ran using our DGL implementation.

August 20, 2025 by Mukhil Azhagan Mallaiyan Sathiaseelan, Anuya Welling, James E. T. Smith, Yao Liu, Phani Vaddadi, Vish Vadlamani

Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs

This blog provides a how-to guide on installing and programming with Taichi Lang on AMD Instinct GPUs.

July 31, 2025 by Tiffany Mintz, Yao Liu, Phani Vaddadi, Vish Vadlamani

Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware

Accelerate Graph Deep Learning on AMD GPUs with DGL and ROCm—scale efficiently with open tools and optimized performance.

July 31, 2025 by Mukhil Azhagan Mallaiyan Sathiaseelan, Anuya Welling, Yao Liu, Phani Vaddadi, Vish Vadlamani

Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

Deploy verl on AMD GPUs for fast, scalable RLHF training with ROCm optimization, Docker scripts, and impressive throughput-convergence results

April 24, 2025 by Yusheng Su, Vicky Tsang, Yao Liu, Phani Vaddadi, Vish Vadlamani, Zicheng Liu

Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs

Learn how to use MegaBlocks to pre-train GPT2 Mixture of Experts (MoE) model, helping you scale your deep learning models effectiveness on AMD GPUs using ROCm

March 23, 2025 by Fabricio Flores, Rishi Madduri, Yao Liu, Phani Vaddadi, Vish Vadlamani

Triton Inference Server with vLLM on AMD GPUs

This blog provides a how-to guide on setting up a Triton Inference Server with vLLM backend powered by AMD GPUs, showcasing robust performance with several LLMs

January 08, 2025 by Fabricio Flores, Tiffany Mintz, Eliot Li, Yao Liu, Ted Themistokleous, Brian Pickrell, Vish Vadlamani