AMD ROCm™ Blogs

Fine-tuning Phi-3.5-mini LLM at scale: Harnessing Accelerate and Slurm for multinode training.
Fine-tuning Phi-3.5-mini-instruct LLM using multinode distributed training with Hugging Face Accelerate, Slurm, and Docker for scalable efficiency....
Understanding Peak, Max-Achievable & Delivered FLOPs
Understanding Peak, Max-Achievable & Delivered FLOPs...

AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2
This blog is part 2 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernete...

Navigating vLLM Inference with ROCm and Kubernetes
Quick introduction to Kubernetes (K8s) and a step-by-step guide on how to use K8s to deploy vLLM using ROCm on AMD GPUs...

Deep dive into the MI300 compute and memory partition modes
This blog explains how to use the MI300 compute and memory partitioning modes to optimize your performance-critical applications. ...

MI300A - Exploring the APU advantage
This blog post introduces the MI300 APU hardware, how it differs from other discrete systems, and how to leverage its GPU programming...

PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm
This blog guides you through the process of using PyTorch FSDP to fine-tune LLMs efficiently on AMD GPUs....

AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1
This blog is part 1 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernete...

Boosting Computational Fluid Dynamics Performance with AMD Instinct™ MI300X
The blog introduces CFD Ansys Fluent benchmarks and provides hands-on guide on installing and running four different Fluent models on AMD GPUs using R...

Zyphra Introduces Frontier Training Kernels for Transformers and SSMs on AMD Instinct MI300X Accelerators
This blog shows Zyphra's new training kernels for transformers and hybrid models on AMD Instinct MI300X accelerators, surpassing the H100s performance...

Introducing AMD's Next-Gen Fortran Compiler
In this post we present a brief preview of AMD's [Next-Gen Fortran Compiler](https://github.com/amd/InfinityHub-CI/blob/main/fortran/README.md), our n...

Stone Ridge Expands Reservoir Simulation Options with AMD Instinct™ Accelerators
Stone Ridge Technology (SRT) pioneered the use of GPUs for high performance reservoir simulation (HPC) nearly a decade ago with ECHELON, its flagship ...

GEMM Kernel Optimization For AMD GPUs
Guide to how GEMMs can be tuned for optimal performance of AI models on AMD GPUs...

Enhancing AI Training with AMD ROCm Software
AMD's GPU training optimizations deliver peak performance for advanced AI models through ROCm software stack....

Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs
Learn how to optimize large language model inference using vLLM on AMD's MI300X GPUs for enhanced performance and efficiency....

Distributed fine-tuning of MPT-30B using Composer on AMD GPUs
This blog uses Composer, a distributed framework, on AMD GPUs to fine-tune MPT-30B in single node as well as multinode...

Announcing the AMD GPU Operator and Metrics Exporter
This post announces the AMD GPU Operator for Kubernetes and and the Device Metrics Exporter, including instructions for getting started with these new...

Getting started with AMD ROCm containers: from base images to custom solutions
Getting started with AMD ROCm containers: from base images to custom solutions...

SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD GPUs
Discover SGLang, a fast serving framework designed for large language and vision-language models on AMD GPUs, supporting efficient runtime and a flexi...

Getting to Know Your GPU: A Deep Dive into AMD SMI
This post introduces AMD System Management Interface (amd-smi), explaining how you can use it to access your GPU’s performance and status data...
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Watch our GitHub repo
- Signup for the ROCm newsletter