Posts by Ruibin Zhang
MoE Training Best Practices on AMD GPUs
- 16 December 2025
This blog covers best practices for training Mixture-of-Experts (MoE) models on AMD Instinct™ MI300/MI355-series[a] GPUs with the ROCm ecosystem. Whether you’re new to MoE distributed architectures or optimizing trillion-parameter models, this guide will help you identify bottlenecks and maximize efficiency on AMD hardware.