Posts by Xiaobo Chen

An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs

With the rapid growth of large-scale models, acceleration libraries are facing higher demands: they must deliver exceptional performance, offer comprehensive functionality, and remain easy to use. To meet these needs, we introduce Primus-Turbo — part of the Primus product family (see our previous blog for background). Primus-Turbo is designed around three core principles: performance, completeness, and ease of use. It supports training, inference, and a wide range of application scenarios, providing developers with a solid foundation to efficiently build and optimize large models on the ROCm platform. See Figure 1 below for a comprehensive stack coverage of Primus-Turbo.

Read more ...


Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

Training large language models (LLMs) at scale is inherently complex. Different frameworks expose inconsistent interfaces, multi-GPU and distributed setups require brittle scripting, and backend-specific quirks introduce overhead that slows down training iterations. Primus tackles these challenges with a streamlined, backend-agnostic training framework that helps developers launch, customize, and scale training jobs faster on AMD GPUs.

Read more ...