Posts by Cheng Yao

Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

23 February 2026

Error parsing meta tag attribute “keywords”: No content.

MoE Training Best Practices on AMD GPUs

16 December 2025

This blog covers best practices for training Mixture-of-Experts (MoE) models on AMD Instinct™ MI300/MI355-series^[a] GPUs with the ROCm ecosystem. Whether you’re new to MoE distributed architectures or optimizing trillion-parameter models, this guide will help you identify bottlenecks and maximize efficiency on AMD hardware.

Read more ...