Liz Li

Liz Li#

Liz Li is a Principal AI Engineer in the AMD AI group, specializing in enabling and optimizing cutting-edge AI models on AMD Instinct GPUs for both distributed inference and training. With over 10 years of experience in computer, graphics, and AI architecture, she has previously led cross-functional teams in delivering platform hardware and software architecture requirements and optimizations for a variety of AI use cases.

Posts by Liz Li

July 07, 2026

Occupancy Math on the AMD MI355X GPU (CDNA4): A From-First-Principles Guide

Derive MI355X GPU (CDNA4) occupancy by hand: the four limiters, MXFP8 GEMM examples, and why matrix-bound kernels hit peak throughput at low occupancy.

https://rocm.blogs.amd.com/software-tools-optimization/occupancy-math-mi355x/README.html

March 02, 2026

MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

MaxText-Slurm: A unified launch system for production-grade LLM training with observability on AMD GPU clusters.

https://rocm.blogs.amd.com/software-tools-optimization/maxtext-slurm/README.html

February 23, 2026

Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

Learn how to use our flexible and scalable pipeline parallelism framework with Primus backend and AMD hardware.

https://rocm.blogs.amd.com/software-tools-optimization/primus-pipeline/README.html

February 08, 2026

Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs

Achieve resilient, checkpoint-less distributed training on AMD GPUs by integrating TorchFT with TorchTitan on Primus-SaFE.

https://rocm.blogs.amd.com/artificial-intelligence/primus-torchft/README.html

December 16, 2025

MoE Training Best Practices on AMD GPUs

Learn how to optimize Mixture-of-Experts (MoE) model training on AMD Instinct GPUs with ROCm. Maximize your AI training performance now!

https://rocm.blogs.amd.com/software-tools-optimization/primus-moe-package/README.html

November 04, 2025

Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training

Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

https://rocm.blogs.amd.com/software-tools-optimization/primus-SaFE/README.html

September 19, 2025

An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs

Primus streamlines training on AMD ROCm, from fine-tuning to massive pretraining on MI300X GPUs—faster, safer, and easier to debug

https://rocm.blogs.amd.com/software-tools-optimization/primus-large-models/README.html

August 22, 2025

Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

https://rocm.blogs.amd.com/software-tools-optimization/primus/README.html

May 01, 2025

Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools

Dive into kernel-level profiling of DeepseekV3 on SGLang—identify GPU bottlenecks and boost large language model performance using ROCm

https://rocm.blogs.amd.com/software-tools-optimization/kernel-analysis-deep/README.html

April 28, 2025

Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs

Learn how to boost your Llama 4 inference performance on AMD MI300X GPUs using AITER-optimized kernels and advanced vLLM techniques

https://rocm.blogs.amd.com/software-tools-optimization/llama4-performance-b/README.html

April 06, 2025

Power Up Llama 4 with AMD Instinct: A Developer’s Day 0 Quickstart

Explore the power of Meta’s Llama 4 multimodal models on AMD Instinct™ MI300X and MI325X GPUs - available from Day 0 with seamless vLLM integration

https://rocm.blogs.amd.com/artificial-intelligence/llama4-day-0-support/README.html

March 28, 2025

Bring FLUX to Life on MI300X: Run and Optimize with Hugging Face Diffusers

The blog will walk you through the FLUX text-to-image diffusion model architecture and show you how to run and optimize it on MI300x.

https://rocm.blogs.amd.com/artificial-intelligence/run-flux-with-hf-diffuser-on-mi300/README.html

March 21, 2025

AITER: AI Tensor Engine For ROCm

We introduce AMD's AI Tensor Engine for ROCm (AITER), our centralized high performance AI operators repository, designed to significantly accelerate AI workloads on AMD GPUs

https://rocm.blogs.amd.com/software-tools-optimization/aiter-ai-tensor-engine/README.html

March 21, 2025

Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

Learn how to optimize DeepSeek-R1 on AMD MI300X with SGLang, AITER kernels and hyperparameter tuning for up to 5× throughput and 60% lower latency over Nvidia H200

https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1-Part2/README.html