AMD ROCm™ Blogs

Featured Posts

Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

Learn about Instella-Long: AMD’s open 3B language model supporting 128K context, trained on MI300X GPUs, outperforming peers on long-context benchmarks.

June 11, 2025 by Jialian Wu, Jiang Liu, Sudhanshu Ranjan, Xiaodong Yu, Gowtham Ramesh, Prakamya Mishra, Zicheng Liu, Yusheng Su, Ximeng Sun, Ze Wang, Emad Barsoum

The ROCm Revisited Series

We present our ROCm Revisited Series. Discover ROCm's role in leading edge supercomputing, its growing ecosystem-from HIP, to developer tools-powering AI, HPC, and data science across multi-GPU and cluster systems

June 06, 2025 by Mohammed Faraaz Mustafa, Liam Berry, Saad Rahim

AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

Explore the techniques we used to improve the training performance on MI300X and MI325X in our MLPerf Training 5.0 submission.

June 04, 2025 by Meena Arunachalam, Miro Hodak, Ravi Dwivedula, Sarthak Arora, Sathish Sanjeevi, Su Ann Chong, Karan Verma, Eliot Li

HIP 7.0 Is Coming: What You Need to Know to Stay Ahead

Get ready for HIP 7.0—explore key API changes that boost CUDA compatibility and streamline portable GPU development, start preparing your code today.

May 28, 2025 by Christophe Paquot, Julia Jiang, Denny Iriawan, Saad Rahim

Accelerating Video Generation on ROCm with Unified Sequence Parallelism: A Practical Guide

A practical guide for accelerating video generation with HunyuanVideo and Wan 2.1 using Unified Sequence Parallelism on AMD GPUs.

July 11, 2025 by Clint Greene

Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day

Nitro-T is a family of text-to-image diffusion models developed by AMD to demonstrate efficient large-scale training on Instinct™ MI300X GPUs. Trained from scratch in under 24 hours

July 09, 2025 by Akash Haridas, Tong Shen, Jingai Yu

vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance

vLLM v1 on AMD ROCm boosts LLM serving with faster TTFT, higher throughput, and optimized multimodal support—ready out of the box.

July 07, 2025 by Seungrok Jung, Hyukjoon Lee, Andy Luo

Unlocking GPU-Accelerated Containers with the AMD Container Toolkit

Simplify GPU acceleration in containers with the AMD Container Toolkit—streamlined setup, runtime hooks, and full ROCm integration.

July 03, 2025 by Abhishek Patil

Ecosystems & Partners

AMD ROCm: Powering the World's Fastest Supercomputers

Discover how ROCm drives the world’s top supercomputers, from El Capitan to Frontier, and why its shaping the future of scalable, open and sustainable HPC

June 10, 2025 by Mohammed Faraaz Mustafa, Saad Rahim

ROCm Revisited: Getting Started with HIP

New to HIP? This blog will introduce you to the HIP runtime API, its key concepts and installation and practical code examples to showcase its functionality.

June 06, 2025 by Liam Berry, Mohammed Faraaz Mustafa, Saad Rahim

ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem

Learn how ROCm evolved to support HPC, AI, and containerized workloads with modern tools, libraries, and deployment options.

June 06, 2025 by Liam Berry, Saad Rahim

A Step-by-Step Guide On How To Deploy Llama Stack on AMD Instinct™ GPU

Learn how to use Meta’s Llama Stack with AMD ROCm and vLLM to scale inference, integrate APIs, and streamline production-ready AI workflows on AMD Instinct™ GPU

April 22, 2025 by Alex He

Applications & Models

Enabling Real-Time Context for LLMs: Model Context Protocol (MCP) on AMD GPUs

Learn how to leverage Model Context Protocol (MCP) servers to provide real time context information to LLMs through a chatbot example on AMD GPUs

June 20, 2025 by Fabricio Flores

Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation

A step by step guide to adapting LLMs to new languages via continued pretraining, with Poro 2 boosting Finnish performance using Llama 3.1 and AMD GPUs

June 18, 2025 by Elaine Zosa, Jouni Luoma, Kai Hakala, Antti Virtanen, Mika Koistinen, Jonathan Burdge

Aligning Mixtral 8x7B with TRL on AMD GPUs

This blog demonstrates how to fine-tune and align Mixtral 8x7B with TRL using DPO and evaluate it on AMD GPUs.

June 12, 2025 by Clint Greene

LLM Quantization with Quark on AMD GPUs: Accuracy and Performance Evaluation

Learn how to use Quark to apply FP8 quantization to LLMs on AMD GPUs, and evaluate accuracy and performance using vLLM and SGLang on AMD MI300X GPUs.

June 09, 2025 by Sean Song

Software Tools & Optimizations

Stay informed

Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
Signup for the ROCm newsletter
View our blog statistics
View the ROCm Developer Hub
Report an issue or request a feature
We are eager to learn from our community! If you would like to contribute to the ROCm Blogs, please submit your technical blog for review at our GitHub. Blog creation can be started through our GitHub issues form.

Contents