ROCm Blogs#

Installing ROCm from source with Spack
Install ROCm and PyTorch from source using Spack. Learn how to optimize builds, manage dependencies, and streamline your GPU software stacks.

ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software
Explore ROCm 6.4's key advancements: AI/HPC performance boosts, enhanced profiling tools, better Kubernetes support and modular drivers, accelerating AI and HPC workloads on AMD GPUs.

ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver
We introduce the new Instinct driver-a modular GPU driver with independent releases simplifying workflows, system setup, and enhancing compatibility across toolkit versions.

Unlock Peak Performance on AMD GPUs with Triton Kernel Optimizations
Learn how Triton compiles and optimizes AI kernels on AMD GPUs, with deep dives into IR flows, hardware-specific passes, and performance tuning tips

AMD Advances Enterprise AI Through OPEA Integration
We announce AMD’s support of Open Platform for Enterprise AI (OPEA), integrating OPEA’s enterprise GenAI framework with AMD’s computing hardware and ROCm software

Boosting Computational Fluid Dynamics Performance with AMD Instinct™ MI300X
The blog introduces CFD Ansys Fluent benchmarks and provides hands-on guide on installing and running four different Fluent models on AMD GPUs using ROCm.

Zyphra Introduces Frontier Training Kernels for Transformers and SSMs on AMD Instinct MI300X Accelerators
This blog shows Zyphra's new training kernels for transformers and hybrid models on AMD Instinct MI300X accelerators, surpassing the H100s performance

Introducing AMD's Next-Gen Fortran Compiler
In this post we present a brief preview of AMD's Next-Gen Fortran Compiler, our new open source Fortran complier optimized for AMD GPUs using OpenMP offloading, offering direct interface to ROCm and HIP.

Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel
Learn how to compress LLMs with GPTQModel and run them efficiently on AMD GPUs using INT4 quantization, reducing memory use, shrinking model size, and enabling fast inference

Power Up Llama 4 with AMD Instinct: A Developer’s Day 0 Quickstart
Explore the power of Meta’s Llama 4 multimodal models on AMD Instinct™ MI300X and MI325X GPUs - available from Day 0 with seamless vLLM integration

AMD Instinct™ MI325X Produces Strong Performance in MLPerf Inference v5.0
We showcase MI325X GPU optimizations that power our MLPerf v5.0 results on Llama 2 70B, highlighting performance tuning, quantization, and vLLM advancements.

Reproducing AMD Instinct GPUs MLPerf Inference v5.0 Submission
A step-by-step guide to reproducing AMD’s MLPerf v5.0 results for Llama 2 70B & SDXL using ROCm on MI325X

What's New in the AMD GPU Operator v1.2.0 Release
This blog highlights the new feature enhancements that were released as part of the AMD GPU Operator v1.2.0 release. New features that enhance the use of AMD Instinct GPUs on Kubernetes including Automated Upgrades, Health Checks and Open-sourcing the codebase.

Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling.
Discover ROCprofiler SDK – ROCm’s next-generation, unified, scalable, and high-performance profiling toolkit for AI and HPC workloads on AMD GPUs.

Speculative Decoding - Deep Dive
This blog shows the performance improvement achieved by applying speculative decoding with Llama models on AMD MI300X GPUs, tested across models, input sizes, and datasets.

Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X
Learn how to optimize DeepSeek-R1 on AMD MI300X with SGLang, AITER kernels and hyperparameter tuning for up to 5× throughput and 60% lower latency over Nvidia H200
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter