All posts — ROCm Blogs

Posted in 2026

20 July 2026 - Understanding Attention Algorithms and Their Backends for Image and Video Generation

20 July 2026 - SPIR-V on ROCm: A Portable IR for AMD GPUs

20 July 2026 - GEAK V3: Agent-Driven, Repository-Level GPU Kernel Optimization across HIP, Triton, and FlyDSL on AMD GPUs

16 July 2026 - Performance Profiling on AMD GPUs – Part 5: Profiling-Driven Kernel Optimization with an AI Code-Assist Tool

16 July 2026 - Multi-Accelerator Support for AIMs and AMD Solution Blueprints

15 July 2026 - When a Faster Kernel Doesn’t Speed Up Serving: Profiling FP8 KV Cache on AMD Instinct MI308X

15 July 2026 - ROCm 7.14: TheRock Goes Production and Expands AMD’s AI Software Platform

15 July 2026 - From Vector Search to Agentic RAG: Building an Enterprise Research Analyst with hipVS

14 July 2026 - LogsLop: A Tiny Summarization Tool for Enormous Log Files

14 July 2026 - Local Image and Video Generation on AMD Ryzen™ AI Max+ Processor (Windows)

13 July 2026 - Triton-Based Optimization of Video Sparse Attention on ROCm

13 July 2026 - Serving NVFP4 Models on AMD Instinct™ MI355 Accelerators

13 July 2026 - QuickReduce INT3 Quantization and Benchmarking on MI355

13 July 2026 - GEAK Agent-Driven Optimization of the DeepSeekV4 MLA Kernel

10 July 2026 - Fast Image Generation and Editing with SGLang Diffusion on AMD GPUs

09 July 2026 - Porting High-Performance HIP Kernels to FlyDSL

09 July 2026 - AMD Instinct™ Network Traffic, Congestion Trends, and Harmonics in Scale-Out Networks for AI Training Clusters

08 July 2026 - Towards Feature Complete Triton Support in JAX-Triton

08 July 2026 - SGLang-ATOM: Bring ROCm-Native Acceleration to SGLang Serving

08 July 2026 - Efficient Hyperparameter Optimization for Autonomous Driving Models with AMD Instinct GPU Partitioning

07 July 2026 - RDC and RocProfiler Compared to DCGM for Commonly Used Metrics

07 July 2026 - Occupancy Math on the AMD MI355X GPU (CDNA4): A From-First-Principles Guide

06 July 2026 - Primus Tuning Agent: Closing the Configuration-Search Loop

06 July 2026 - Accelerating Diffusers and xDiT Image Generation with MXFP4 using AMD Quark on AMD Instinct™ MI350 GPUs

03 July 2026 - Building a GPU-Resident YOLO26 Object Detection Pipeline on the AMD Radeon™ AI PRO R9700 GPU

03 July 2026 - AgentKernelArena: Benchmarking AI Coding Agents for GPU Kernel Optimization on AMD Instinct GPUs

03 July 2026 - Accelerating Large-Scale LLM Inference on AMD Instinct MI350X/MI355X with Eagle3 and AMD Quark

30 June 2026 - Optimizing MI300X Inter-Chiplet Communication via the RCCL Tuner API

29 June 2026 - OpenXLA and JAX - ROCm Support and the State of CI

29 June 2026 - Accelerating LLM Inference on AMD GPUs with Low-Latency GEMMs

26 June 2026 - MXFP6 and MXFP4 Mixed Precision for Accelerating Dense LLMs on AMD Instinct MI355X

26 June 2026 - Efficient GPU Utilization With Workload Pre-Emption in AMD Resource Manager

24 June 2026 - DP Attention and TBO for DeepSeek-V4 on MI355X

23 June 2026 - Faster Kimi-K2.5-W4A8 Decoding with EAGLE3 on AMD Instinct™ MI325X

19 June 2026 - A Practical Guide to Running LLMs on AMD Radeon™ GPUs

18 June 2026 - Efficient and Portable 3D Explorable World Generation on AMD GPUs

18 June 2026 - Comparative Analysis of Scale-Out RoCE Network Traffic Patterns and Loads in Training Large Language Models

18 June 2026 - Building and Deploying Custom hipBLASLt Libraries on AMD Instinct GPUs

17 June 2026 - Utilizing AMD Schola and UnrealRoboticsLab with AMD ROCm™ Software to Train a Robotic Arm

16 June 2026 - Technical Dive into AMD’s MLPerf Training v6.0 Submission

16 June 2026 - Reproducing AMD MLPerf Training v6.0 Submission Result

16 June 2026 - ATOMesh: Unlocking AMD Hardware for Scalable LLM Serving

15 June 2026 - ATOM: Unlocking Extreme AMD Instinct Inference with Software-Hardware Co-Optimization

11 June 2026 - Productionizing TurboQuant on AMD GPUs for KV-Cache-Bound LLM Inference

11 June 2026 - Low Kruskal-Rank Adaptation

10 June 2026 - Dropless MoE Training in JAX with Primus-Turbo

08 June 2026 - ORBIT-2 based Weather and Climate Downscaling and Downscaled Global Forecasts on AMD Instinct

03 June 2026 - Adapting AIM LLMs For Specific Use Cases Through Fine-Tuning in AMD AI Workbench

01 June 2026 - Performance Profiling on AMD GPUs - Part 4: Fortran OpenMP Offload Edition

01 June 2026 - Out-of-the-Box ROLL Support on AMD GPUs: Accelerating Reinforcement Learning at Scale

29 May 2026 - Running Variational Quantum Eigensolver with Qiskit Aer on AMD Instinct

29 May 2026 - Enabling Speculative Speculative Decoding on MI300X

27 May 2026 - Deep Dive Into 4-Wave Interleave FP8 GEMM

25 May 2026 - AI Inference on AMD Ryzen™ AI Max Processor

22 May 2026 - From Naive to Near-Peak: Building High-Performance GEMM Kernels with Gluon

22 May 2026 - From Build to Benchmark: ONNX Model Serving with Triton Inference Server on AMD GPUs

20 May 2026 - ROCm 7.13: Expanding Hardware, Tools, and Reach

20 May 2026 - QuickReduce FP4 Quantization and Benchmarking on MI355

20 May 2026 - Diffusion-based Atmospheric Downscaling on AMD Instinct GPUs

15 May 2026 - Semantic Fencing of Video Streams Using Embedding Splits from Vision Foundation Models

14 May 2026 - Further Accelerating Kimi-K2.5 on AMD Instinct™ MI325X: W4A8 & W8A8 Quantization with AMD Quark

11 May 2026 - Accelerating ComfyUI Workflows on AMD Instinct™ MI355X GPUs with ROCm

07 May 2026 - vLLM-ATOM: Unlocking Native AMD Performance in the vLLM Ecosystem

07 May 2026 - AMD-Powered 3D Gaussian Splatting for Autonomous Driving Scenes

05 May 2026 - Accelerating Mixture-of-Experts Execution with FarSkip-Collective Models

27 April 2026 - TraceLens: Democratizing AI Performance Analysis

24 April 2026 - Styled Text Image Generation with Eruku on AMD

24 April 2026 - Primus Projection: Estimate Memory and Performance Before You Train

20 April 2026 - Getting Started with FlyDSL Nightly Wheels on ROCm

20 April 2026 - FLy: A New Paradigm for Speculative Decoding — Accepting Semantically Correct Drafts Beyond Exact Match

10 April 2026 - Introduction to profiling tools for AMD hardware

07 April 2026 - Serving CTR Recommendation Models with Triton Inference Server using the ONNX Runtime Backend

06 April 2026 - FlashInfer on ROCm: High‑Throughput Prefill Attention via AITER

06 April 2026 - Customizing Kernels with hipBLASLt TensileLite GEMM Tuning - Advanced User Guide

02 April 2026 - Deploy and Customize AMD Solution Blueprints

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

31 March 2026 - Training a Robotic Arm Using MuJoCo and JAX on AMD Hardware with ROCm™

31 March 2026 - Leveraging AMD AI Workbench and Autoscaling to Scale LLM Inference for Optimal Resource Utilization

25 March 2026 - Programming Tensor Descriptors in Composable Kernel (CK)

24 March 2026 - GROMACS on AMD Instinct GPUs: A Complete Build Guide

24 March 2026 - Engineering Qwen-VL for Production: Vision Module Architecture and Optimization Practices

24 March 2026 - Accelerating Kimi-K2.5 on AMD Instinct™ MI300X: Optimizing Fused MoE with FlyDSL

23 March 2026 - Edge-to-Cloud Robotics with AMD ROCm: From Data Collection to Real-Time Inference

23 March 2026 - AMD Device Metrics Exporter v1.4.2: Enhanced Observability, Deeper RAS Insights, and Smarter GPU Telemetry for Modern HPC & AI Clusters

19 March 2026 - hipBLASLt Online GEMM Tuning

19 March 2026 - Utilizing AMD Instinct GPU Accelerators for Weather and Precipitation Forecasting with NeuralGCM

18 March 2026 - Multi-Node Distributed Inference for Diffusion Models with xDiT

13 March 2026 - GROMACS Performance on AMD Instinct MI355X

10 March 2026 - FP8 GEMM Optimization on AMD CDNA™4 Architecture

09 March 2026 - Getting Started with ComfyUI on AMD Radeon™ RX 9000 Series GPUs

09 March 2026 - Agentic Diagnosis for LLM Training at Scale

06 March 2026 - HPC Coding Agent - Part 3: MCP Tool for Profiling

06 March 2026 - Fine-Tuning AI Surrogate Models for Physics Simulations with Walrus on AMD Instinct GPU Accelerators

06 March 2026 - Ensemble High-Resolution Weather Forecasting on AMD Instinct GPU Accelerators

04 March 2026 - HPC Coding Agent - Part 2: An MCP Tool for Code Optimization with OpenEvolve

02 March 2026 - Streamlining Recommendation Model Training on AMD Instinct™ GPUs

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

27 February 2026 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm 7 Support for Efficient ML Workflows

24 February 2026 - PyTorch Offline Tuning with TunableOp

24 February 2026 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

24 February 2026 - JAX-AITER: Bringing AMD’s Optimized AI Kernels to JAX on ROCm™

24 February 2026 - Getting Started with AMD Resource Manager: Efficient Sharing of AMD Instinct™ GPUs for R&D Teams and AI Practitioners

23 February 2026 - Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

20 February 2026 - FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs

19 February 2026 - Introducing hipThreads: A C++ - Style Concurrency Library for AMD GPUs

17 February 2026 - Unlocking Sparse Acceleration on AMD GPUs with hipSPARSELt

17 February 2026 - Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression

17 February 2026 - Adaptive Top-K Selection: Eliminating Performance Cliffs Across All K Values on AMD GPUs

13 February 2026 - Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

12 February 2026 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

11 February 2026 - Solution Blueprints: Accelerating AI Deployment with AMD Enterprise AI

09 February 2026 - Digital Twins on AMD: Building Robotic Simulations Using Edge AI PCs

09 February 2026 - Building Robotics Applications with Ryzen AI and ROS 2

08 February 2026 - Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs

06 February 2026 - Accelerating Graph Layout with AI and ROCm on AMD GPUs

05 February 2026 - Micro-World: First AMD Open-Source World Models for Interactive Video Generation

03 February 2026 - Foundations of Molecular Generation with GP-MoLFormer on AMD Instinct MI300X Accelerators

30 January 2026 - Debugging NaN Results in CK Tile GEMM: A rocgdb Detective Story

22 January 2026 - ROCm 7.2: Smarter, Faster, and More Scalable for Modern AI Workloads

22 January 2026 - Nitro-AR: A Compact AR Transformer for High-Quality Image Generation

22 January 2026 - LLM Inference Optimization Using AMD GPU Partitioning

21 January 2026 - ROCm Becomes a First-Class Platform in the vLLM Ecosystem

20 January 2026 - Quickly Developing Powerful Flash Attention Using TileLang on AMD Instinct MI300X GPU

15 January 2026 - Deep Dive into Primus: High-Performance Training for Large Language Models

14 January 2026 - Applying Compute Partitioning for Workloads on MI300X GPUs

13 January 2026 - Reimagining GPU Allocation in Kubernetes: Introducing the AMD GPU DRA Driver

12 January 2026 - Installing AMD HIP-Enabled GROMACS on HPC Systems: A LUMI Supercomputer Case Study

12 January 2026 - Athena-PRM: Enhancing Multimodal Reasoning with Data-Efficient Process Reward Models

08 January 2026 - Using Gradient Boosting Libraries on MI300X for Financial Risk Prediction

08 January 2026 - Introducing the AMD Network Operator v1.0.0: Simplifying High-Performance Networking for AMD Platforms

08 January 2026 - Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms

07 January 2026 - High-Resolution Weather Forecasting with StormCast on AMD Instinct GPU Accelerators

07 January 2026 - Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation

06 January 2026 - ROCm MaxText Testing — Decoupled (Offline) and Cloud-Integrated Modes

06 January 2026 - ROCm Fork of MaxText: Structure and Strategy

02 January 2026 - SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

02 January 2026 - Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models

Posted in 2025

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

19 December 2025 - Getting Started with AMD AI Workbench: Deploying and Managing AI Workloads

18 December 2025 - A Step-by-Step Walkthrough of Decentralized LLM Training on AMD GPUs

16 December 2025 - MoE Training Best Practices on AMD GPUs

16 December 2025 - 3D Scene Reconstruction from the Inside: Explore the Mathematics Behind gsplat

11 December 2025 - Accelerating llama.cpp on AMD Instinct MI300X

10 December 2025 - Medical Imaging on MI300X: SwinUNETR Inference Optimization

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

08 December 2025 - Accelerating Autonomous Driving Model Training on AMD ROCm™ Software

06 December 2025 - Building a State-of-the-Art 32 Billion Reasoning Model with Only Synthetic Data on AMD GPUs

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

04 December 2025 - Modernizing Taichi Lang to LLVM 20 for MI355X GPU Acceleration

03 December 2025 - Týr-the-Pruner: Search-based Global Structural Pruning for LLMs

03 December 2025 - HPC Coding Agent - Part 1: Combining GLM-powered Cline and RAG Using MCP

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

28 November 2025 - VLM Fine-Tuning for Robotics on AMD Enterprise AI Suite

27 November 2025 - Fine-Tune LLMs for Proteins with AMD Enterprise AI Suite

27 November 2025 - Exploring Gameplay Video Generation with Hunyuan-GameCraft

25 November 2025 - Using Reinforcement Learning to Fix Text in AI-Generated Videos

24 November 2025 - The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism

21 November 2025 - Inference with HunyuanWorld-Voyager on AMD Instinct GPUs

21 November 2025 - Accelerating AI-Driven Crystalline Materials Design with MatterGen on AMD Instinct MI300X

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

17 November 2025 - AMD Enterprise AI Suite: Open Infrastructure for Production AI

14 November 2025 - Plug-and-Play CuPy on ROCm: Data Analytics Acceleration Made Simple

13 November 2025 - Democratizing AI Compute with AMD Using SkyPilot

13 November 2025 - Accelerating Vector Search: hipVS and hipRAFT on AMD

12 November 2025 - Technical Dive into AMD MLPerf Training v5.1 Submission

12 November 2025 - Reproducing AMD MLPerf Training v5.1 Submission Result

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

10 November 2025 - Training AI Weather Forecasting Models on AMD Instinct

05 November 2025 - Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script

05 November 2025 - Continuing the Momentum: Refining ROCm For The Next Wave Of AI and HPC

04 November 2025 - Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training

04 November 2025 - Retrieval Augmented Generation (RAG) with vLLM, LangChain and Chroma

29 October 2025 - High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

24 October 2025 - Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation

23 October 2025 - STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm

23 October 2025 - Performance Profiling on AMD GPUs - Part 3: Advanced Usage

21 October 2025 - Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring

20 October 2025 - ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System

16 October 2025 - Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs

14 October 2025 - Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More

09 October 2025 - GEMM Tuning within hipBLASLt– Part 2

07 October 2025 - Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection

07 October 2025 - Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs

06 October 2025 - Optimizing FP4 Mixed-Precision Inference with Petit on AMD Instinct MI250 and MI300 GPUs: A Developer’s Perspective

03 October 2025 - Optimizing Drug Discovery Tools on AMD MI300X Part 2: 3D Molecular Generation with SemlaFlow

03 October 2025 - Elevating 3D Scene Rendering with GSplat

02 October 2025 - From Ingestion to Inference: RAG Pipelines on AMD GPUs

01 October 2025 - GPU Partitioning Made Easy: Pack More AI Workloads Using AMD GPU Operator

01 October 2025 - Enabling FlashInfer on ROCm for Accelerated LLM Serving

30 September 2025 - Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture

30 September 2025 - Coding Agents on AMD GPUs: Fast LLM Pipelines for Developers

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

24 September 2025 - Accelerating Audio-Driven Video Generation: WAN2.2-S2V on AMD ROCm

24 September 2025 - A Simple Design for Serving Video Generation Models with Distributed Inference

19 September 2025 - Optimizing Drug Discovery Tools on AMD MI300X Part 1: Molecular Design with REINVENT

19 September 2025 - An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs

18 September 2025 - Running SOTA AI-based Weather Forecasting models on AMD Instinct

17 September 2025 - AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models

16 September 2025 - ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity

11 September 2025 - Efficient LLM Serving with MTP: DeepSeek V3 and SGLang on AMD Instinct GPUs

10 September 2025 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

09 September 2025 - Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

05 September 2025 - GEMM Tuning within hipBLASLt - Part 1

04 September 2025 - Step-3 Deployment Simplified: A Day 0 Developer’s Guide on AMD Instinct™ GPUs

28 August 2025 - Unleashing AMD Instinct™ MI300X GPUs for LLM Serving: Disaggregating Prefill & Decode with SGLang

26 August 2025 - QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

25 August 2025 - AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs

22 August 2025 - Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

22 August 2025 - Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning

20 August 2025 - DGL in the Real World: Running GNNs on Real Use Cases

19 August 2025 - Wan2.2 Fine-Tuning: Tailoring an Advanced Video Generation Model on a Single GPU

19 August 2025 - Running ComfyUI on AMD Instinct

19 August 2025 - All-in-One Video Editing with VACE on AMD Instinct GPUs

19 August 2025 - Accelerating FastVideo on AMD GPUs with TeaCache

13 August 2025 - Performance Profiling on AMD GPUs – Part 2: Basic Usage

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

07 August 2025 - Running ComfyUI in Windows with ROCm on WSL

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

03 August 2025 - AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

31 July 2025 - Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware

31 July 2025 - Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs

25 July 2025 - Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

24 July 2025 - Benchmarking Reasoning Models: From Tokens to Answers

21 July 2025 - Chain-of-Thought Guided Visual Reasoning Using Llama 3.2 on a Single AMD Instinct MI300X GPU

18 July 2025 - Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs

18 July 2025 - Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

17 July 2025 - Vibe Coding Pac-Man Inspired Game with DeepSeek-R1 and AMD Instinct MI300X

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

14 July 2025 - Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot

11 July 2025 - Accelerating Video Generation on ROCm with Unified Sequence Parallelism: A Practical Guide

09 July 2025 - Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day

07 July 2025 - vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance

03 July 2025 - Unlocking GPU-Accelerated Containers with the AMD Container Toolkit

28 June 2025 - Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm

26 June 2025 - Performance Profiling on AMD GPUs – Part 1: Foundations

20 June 2025 - Enabling Real-Time Context for LLMs: Model Context Protocol (MCP) on AMD GPUs

18 June 2025 - Fine-Tuning LLMs with GRPO on AMD MI300X: Scalable RLHF with Hugging Face TRL and ROCm

18 June 2025 - Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation

12 June 2025 - Aligning Mixtral 8x7B with TRL on AMD GPUs

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

10 June 2025 - AMD ROCm: Powering the World’s Fastest Supercomputers

09 June 2025 - LLM Quantization with Quark on AMD GPUs: Accuracy and Performance Evaluation

06 June 2025 - The ROCm Revisited Series

06 June 2025 - ROCm Revisited: Getting Started with HIP

06 June 2025 - ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

30 May 2025 - Scale LLM Inference with Multi-Node Infrastructure

28 May 2025 - HIP 7.0 Is Coming: What You Need to Know to Stay Ahead

22 May 2025 - ROCm Runfile Installer Is Here!

21 May 2025 - From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile

20 May 2025 - Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs

20 May 2025 - AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving

16 May 2025 - Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang

15 May 2025 - Step-Video-T2V Inference with xDiT on AMD Instinct MI300X GPUs

12 May 2025 - Accelerated JPEG decoding on AMD Instinct™ GPUs with rocJPEG

07 May 2025 - DataFrame Acceleration: hipDF and hipDF.pandas on AMD GPUs

06 May 2025 - Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

06 May 2025 - CuPy and hipDF on AMD: The Basics and Beyond

01 May 2025 - Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools

28 April 2025 - Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart

28 April 2025 - Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs

28 April 2025 - Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

22 April 2025 - A Step-by-Step Guide On How To Deploy Llama Stack on AMD Instinct™ GPU

15 April 2025 - Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs

14 April 2025 - Installing ROCm from source with Spack

11 April 2025 - ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

10 April 2025 - Unlock Peak Performance on AMD GPUs with Triton Kernel Optimizations

09 April 2025 - Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel

06 April 2025 - Power Up Llama 4 with AMD Instinct: A Developer’s Day 0 Quickstart

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

28 March 2025 - What’s New in the AMD GPU Operator v1.2.0 Release

28 March 2025 - Bring FLUX to Life on MI300X: Run and Optimize with Hugging Face Diffusers

27 March 2025 - Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

24 March 2025 - Speculative Decoding - Deep Dive

23 March 2025 - Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs

21 March 2025 - Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

21 March 2025 - AITER: AI Tensor Engine For ROCm

14 March 2025 - Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide

14 March 2025 - Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance

13 March 2025 - Optimized ROCm Docker for Distributed AI Training

13 March 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3

12 March 2025 - AMD Advances Enterprise AI Through OPEA Integration

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

02 March 2025 - Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X

28 February 2025 - Measuring Max-Achievable FLOPs – Part 2

25 February 2025 - Deploying Serverless AI Inference on AMD GPU Clusters

21 February 2025 - Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU

21 February 2025 - How to Build a vLLM Container for Inference and Benchmarking

19 February 2025 - Fine-tuning Phi-3.5-mini LLM at scale: Harnessing Accelerate and Slurm for multinode training

14 February 2025 - Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1

14 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2

13 February 2025 - Navigating vLLM Inference with ROCm and Kubernetes

09 February 2025 - PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm

09 February 2025 - MI300A - Exploring the APU advantage

09 February 2025 - Deep dive into the MI300 compute and memory partition modes

07 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1

06 February 2025 - GEMM Kernel Optimization For AMD GPUs

31 January 2025 - Enhancing AI Training with AMD ROCm Software

29 January 2025 - Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs

29 January 2025 - Announcing the AMD GPU Operator and Metrics Exporter

28 January 2025 - Distributed fine-tuning of MPT-30B using Composer on AMD GPUs

24 January 2025 - Vision Mamba on AMD GPU with ROCm

16 January 2025 - Getting started with AMD ROCm containers: from base images to custom solutions

14 January 2025 - Boosting Computational Fluid Dynamics Performance with AMD Instinct™ MI300X

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

Posted in 2024

10 December 2024 - Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators

03 December 2024 - Transformer based Encoder-Decoder models for image-captioning on AMD GPUs

13 November 2024 - SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs

13 November 2024 - Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs

13 November 2024 - Introducing AMD’s Next-Gen Fortran Compiler

01 November 2024 - Distributed Data Parallel Training on AMD GPU with ROCm

24 October 2024 - Torchtune on AMD GPUs How-To Guide: Fine-tuning and Scaling LLMs with Multi-GPU Power

24 October 2024 - CTranslate2: Efficient Inference with Transformer Models on AMD GPUs

23 October 2024 - Inference with Llama 3.2 Vision LLMs on AMD GPUs Using ROCm

15 October 2024 - Speed Up Text Generation with Speculative Sampling on AMD GPUs

15 October 2024 - Multinode Fine-Tuning of Stable Diffusion XL on AMD GPUs with Hugging Face Accelerate and OCI’s Kubernetes Engine (OKE)

09 October 2024 - Supercharging JAX with Triton Kernels on AMD GPUs

03 October 2024 - Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch

23 September 2024 - Fine-tuning Llama 3 with Axolotl using ROCm on AMD GPUs

19 September 2024 - Inferencing and serving with vLLM on AMD GPUs

19 September 2024 - Enhancing vLLM Inference on AMD GPUs

17 September 2024 - Getting to Know Your GPU: A Deep Dive into AMD SMI

10 September 2024 - Introducing the AMD ROCm™ Offline Installer Creator: Simplifying Deployment for AI and HPC

06 September 2024 - Optimize GPT Training: Enabling Mixed Precision Training in JAX using ROCm on AMD GPUs

03 September 2024 - Image Classification with BEiT, MobileNet, and EfficientNet using ROCm on AMD GPUs

29 August 2024 - Seismic stencil codes - part 3

29 August 2024 - Seismic stencil codes - part 2

29 August 2024 - Seismic stencil codes - part 1

28 August 2024 - Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission

21 August 2024 - Performing natural language processing tasks with LLMs on ROCm running on AMD GPUs

19 August 2024 - Using AMD GPUs for Enhanced Time Series Forecasting with Transformers

09 August 2024 - Inferencing with Grok-1 on AMD GPUs

29 July 2024 - Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD

29 July 2024 - Graph analytics on AMD GPUs using Gunrock

22 July 2024 - Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs

11 July 2024 - DBRX Instruct on AMD GPUs

11 July 2024 - Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

03 July 2024 - Accelerating models on ROCm using PyTorch TunableOp

02 July 2024 - A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs

28 June 2024 - Mamba on AMD GPUs with ROCm

28 June 2024 - Deep Learning Recommendation Models on AMD GPUs

27 June 2024 - Fine-tuning and Testing Cutting-Edge Speech Models using ROCm on AMD GPUs

18 June 2024 - TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs

10 June 2024 - Stone Ridge Expands Reservoir Simulation Options with AMD Instinct™ Accelerators

04 June 2024 - Segment Anything with AMD GPUs

31 May 2024 - SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel

29 May 2024 - Unveiling performance insights with PyTorch Profiler on an AMD GPU

23 May 2024 - Panoptic segmentation and instance segmentation with Detectron2 on AMD GPUs

16 May 2024 - Siemens taps AMD Instinct™ GPUs to expand high-performance hardware options for Simcenter STAR-CCM+

16 May 2024 - AMD Collaboration with the University of Michigan offers High Performance Open-Source Solutions to the Bioinformatics Community

15 May 2024 - Accelerating Large Language Models with Flash Attention on AMD GPUs

13 May 2024 - Reading AMD GPU ISA

07 May 2024 - AMD in Action: Unveiling the Power of Application Tracing and Profiling

01 May 2024 - Step-by-Step Guide to Use OpenLLM on AMD GPUs

01 May 2024 - Inferencing with Mixtral 8x22B on AMD GPUs

30 April 2024 - Training a Neural Collaborative Filtering (NCF) Recommender on an AMD GPU

26 April 2024 - Table Question-Answering with TaPas

26 April 2024 - Multimodal (Visual and Language) understanding with LLaVA-NeXT

26 April 2024 - Application portability with HIP

24 April 2024 - Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

24 April 2024 - Transforming Words into Motion: A Guide to Video Generation with AMD GPU

18 April 2024 - C++17 parallel algorithms and HIPSTDPAR

17 April 2024 - Inferencing with AI2’s OLMo model on AMD GPU

16 April 2024 - Text Summarization with FLAN-T5

16 April 2024 - Speech-to-Text on an AMD GPU with Whisper

16 April 2024 - PyTorch C++ Extension on AMD GPU

16 April 2024 - Programming AMD GPUs with Julia

16 April 2024 - Program Synthesis with CodeGen

16 April 2024 - Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU

16 April 2024 - Instruction fine-tuning of StarCoder with PEFT on multiple AMD GPUs

16 April 2024 - Affinity part 2 - System topology and controlling affinity

16 April 2024 - Affinity part 1 - Affinity, placement, and order

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama Model on a single AMD GPU

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU

15 April 2024 - Developing Triton Kernels on AMD GPUs

11 April 2024 - GPU Unleashed: Training Reinforcement Learning Agents with Stable Baselines3 on an AMD GPU in Gymnasium Environment

09 April 2024 - ResNet for image classification using AMD GPUs

08 April 2024 - Small language models with Phi-2

04 April 2024 - Using the ChatGLM-6B bilingual language model with AMD GPUs

04 April 2024 - Total body segmentation using MONAI Deploy on an AMD GPU

04 April 2024 - Retrieval Augmented Generation (RAG) using LlamaIndex

04 April 2024 - Image classification using Vision Transformer with AMD GPUs

04 April 2024 - Building semantic search with SentenceTransformers on AMD

01 April 2024 - Scale AI applications with Ray

29 March 2024 - Automatic mixed precision in PyTorch using AMD GPUs

15 March 2024 - Large language model inference optimizations on AMD GPUs

12 March 2024 - Building a decoder transformer model on AMD GPU(s)

11 March 2024 - Question-answering Chatbot with LangChain on an AMD GPU

08 March 2024 - Music Generation With MusicGen on an AMD GPU

23 February 2024 - Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs

08 February 2024 - Simplifying deep learning: A guide to PyTorch Lightning

07 February 2024 - Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU

05 February 2024 - Using LoRA for efficient fine-tuning: Fundamental principles

01 February 2024 - Fine-tune Llama model with LoRA: Customizing a large language model for question-answering

01 February 2024 - Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

29 January 2024 - Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU

26 January 2024 - Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

26 January 2024 - Accelerating XGBoost with Dask using multiple AMD GPUs

25 January 2024 - LLM distributed supervised fine-tuning with JAX

24 January 2024 - Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs

24 January 2024 - Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs

24 January 2024 - Efficient deployment of large language models with Text Generation Inference on AMD GPUs

Posted in 2023

03 November 2023 - Sparse matrix vector multiplication - part 1

15 September 2023 - Jacobi Solver with HIP and OpenMP offloading

11 September 2023 - Creating a PyTorch/TensorFlow code environment on AMD GPUs

18 July 2023 - Finite difference method - Laplacian part 4

08 June 2023 - GPU-aware MPI with ROCm

17 May 2023 - Register pressure in AMD CDNA™2 GPUs

11 May 2023 - Finite difference method - Laplacian part 3

09 March 2023 - AMD Instinct™ MI200 GPU memory space overview

26 January 2023 - AMD ROCm™ installation

04 January 2023 - Finite difference method - Laplacian part 2

Posted in 2022

14 November 2022 - Finite difference method - Laplacian part 1

14 November 2022 - AMD matrix cores