Yao Liu#
Yao Liu is a senior software development manager at AMD. He focuses on building high-performance software solutions and is deeply passionate about artificial intelligence, open-source software, and the evolving machine learning ecosystem.
Posts by Yao Liu

From Ingestion to Inference: RAG Pipelines on AMD GPUs
Build a RAG enhanced GenAI application that improves the quality of model responses by incorporating data that is missing in the model training data.

Enabling FlashInfer on ROCm for Accelerated LLM Serving
FlashInfer is an open-source library for accelerating LLM serving that is now supported by ROCm.

Coding Agents on AMD GPUs: Fast LLM Pipelines for Developers
Accelerate AI-assisted coding with agentic workflows on AMD GPUs. Deploy DeepSeek-V3.1 via SGLang, vLLM, or llama.cpp to power fast, scalable coding agents

Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows
Ray, combined with ROCm, provides a powerful platform for scaling AI applications, particularly for training and inference workloads.

Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration
performance optimizations for llama.cpp on AMD Instinct GPUs

DGL in the Real World: Running GNNs on Real Use Cases
We walk through four advanced GNN workloads from heterogeneous e-commerce graphs to neuroscience applications that we successfully ran using our DGL implementation.

Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs
This blog provides a how-to guide on installing and programming with Taichi Lang on AMD Instinct GPUs.

Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware
Accelerate Graph Deep Learning on AMD GPUs with DGL and ROCm—scale efficiently with open tools and optimized performance.

Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration
Deploy verl on AMD GPUs for fast, scalable RLHF training with ROCm optimization, Docker scripts, and impressive throughput-convergence results

Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs
Learn how to use Megablocks to pre-train GPT2 Mixture of Experts (MoE) model, helping you scale your deep learning models effectiveness on AMD GPUs using ROCm

Triton Inference Server with vLLM on AMD GPUs
This blog provides a how-to guide on setting up a Triton Inference Server with vLLM backend powered by AMD GPUs, showcasing robust performance with several LLMs