Authors — ROCm Blogs

Posts by

31 March 2026 - Leveraging AMD AI Workbench and Autoscaling to Scale LLM Inference for Optimal Resource Utilization

Posts by AMD Brevitas Team

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

Posts by AMD Quark Team

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

Posts by AMD Quark team

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

Posts by Aarne Talman

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Abby O’Neill

14 July 2025 - Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot

Posts by Abhishek Patil

03 July 2025 - Unlocking GPU-Accelerated Containers with the AMD Container Toolkit

Posts by Adeem Jassani

27 April 2026 - TraceLens: Democratizing AI Performance Analysis

18 December 2025 - A Step-by-Step Walkthrough of Decentralized LLM Training on AMD GPUs

Posts by Aditi Ghai Rana*

11 June 2026 - Productionizing TurboQuant on AMD GPUs for KV-Cache-Bound LLM Inference

Posts by Aditya Bhattacharji

16 September 2025 - ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

Posts by Aditya Kumar Singh

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

Posts by Akash Haridas

09 July 2025 - Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day

Posts by Akhila Yeruva

23 March 2026 - AMD Device Metrics Exporter v1.4.2: Enhanced Observability, Deeper RAS Insights, and Smarter GPU Telemetry for Modern HPC & AI Clusters

Posts by Akshay Viswanathan

24 February 2026 - Getting Started with AMD Resource Manager: Efficient Sharing of AMD Instinct™ GPUs for R&D Teams and AI Practitioners

Posts by Aku Rouhe

03 June 2026 - Adapting AIM LLMs For Specific Use Cases Through Fine-Tuning in AMD AI Workbench

Posts by Albin Toft

03 December 2025 - HPC Coding Agent - Part 1: Combining GLM-powered Cline and RAG Using MCP

25 November 2025 - Using Reinforcement Learning to Fix Text in AI-Generated Videos

24 September 2025 - A Simple Design for Serving Video Generation Models with Distributed Inference

19 August 2025 - Running ComfyUI on AMD Instinct

Posts by Alessandro Fanfarillo

23 October 2025 - Performance Profiling on AMD GPUs - Part 3: Advanced Usage

13 August 2025 - Performance Profiling on AMD GPUs – Part 2: Basic Usage

26 June 2025 - Performance Profiling on AMD GPUs – Part 1: Foundations

18 April 2024 - C++17 parallel algorithms and HIPSTDPAR

17 May 2023 - Register pressure in AMD CDNA™2 GPUs

Posts by Alessio Tonioni

24 April 2026 - Styled Text Image Generation with Eruku on AMD

Posts by Alex Bogdan

23 October 2025 - STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm

Posts by Alex He

23 March 2026 - Edge-to-Cloud Robotics with AMD ROCm: From Data Collection to Real-Time Inference

22 April 2025 - A Step-by-Step Guide On How To Deploy Llama Stack on AMD Instinct™ GPU

12 March 2025 - AMD Advances Enterprise AI Through OPEA Integration

13 February 2025 - Navigating vLLM Inference with ROCm and Kubernetes

Posts by Alex Saliniemi

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Alex Voicu

18 April 2024 - C++17 parallel algorithms and HIPSTDPAR

Posts by Alexander Aurell

11 February 2026 - Solution Blueprints: Accelerating AI Deployment with AMD Enterprise AI

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Alexander Finn

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

17 November 2025 - AMD Enterprise AI Suite: Open Infrastructure for Production AI

Posts by Alireza Sariaslani

01 October 2025 - GPU Partitioning Made Easy: Pack More AI Workloads Using AMD GPU Operator

Posts by Amanzhol Salykov

27 May 2026 - Deep Dive Into 4-Wave Interleave FP8 GEMM

10 March 2026 - FP8 GEMM Optimization on AMD CDNA™4 Architecture

30 September 2025 - Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture

Posts by Ammar Elwazir

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

Posts by Andrew Ma

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

Posts by Andrey Ivannikov

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Andy Allred

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Andy Luo

27 May 2026 - Deep Dive Into 4-Wave Interleave FP8 GEMM

10 March 2026 - FP8 GEMM Optimization on AMD CDNA™4 Architecture

21 January 2026 - ROCm Becomes a First-Class Platform in the vLLM Ecosystem

02 January 2026 - Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models

16 December 2025 - MoE Training Best Practices on AMD GPUs

24 November 2025 - The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

04 November 2025 - Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training

30 September 2025 - Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture

19 September 2025 - An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs

11 September 2025 - Efficient LLM Serving with MTP: DeepSeek V3 and SGLang on AMD Instinct GPUs

28 August 2025 - Unleashing AMD Instinct™ MI300X GPUs for LLM Serving: Disaggregating Prefill & Decode with SGLang

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

07 July 2025 - vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance

20 May 2025 - AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving

01 May 2025 - Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools

28 April 2025 - Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart

28 April 2025 - Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs

06 April 2025 - Power Up Llama 4 with AMD Instinct: A Developer’s Day 0 Quickstart

21 March 2025 - Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

21 February 2025 - Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU

29 January 2025 - Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs

Posts by Andy Ye

10 June 2026 - Dropless MoE Training in JAX with Primus-Turbo

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

Posts by Angela Wang

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

Posts by Anik Chaudhuri

07 October 2025 - Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs

18 July 2025 - Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs

18 July 2025 - Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

Posts by Anshu Raina

27 April 2026 - TraceLens: Democratizing AI Performance Analysis

24 April 2026 - Primus Projection: Estimate Memory and Performance Before You Train

23 February 2026 - Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

Posts by Anshul Gupta

20 May 2026 - ROCm 7.13: Expanding Hardware, Tools, and Reach

22 January 2026 - ROCm 7.2: Smarter, Faster, and More Scalable for Modern AI Workloads

05 November 2025 - Continuing the Momentum: Refining ROCm For The Next Wave Of AI and HPC

11 September 2025 - Efficient LLM Serving with MTP: DeepSeek V3 and SGLang on AMD Instinct GPUs

22 August 2025 - Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

25 July 2025 - Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

28 June 2025 - Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm

20 May 2025 - AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving

06 May 2025 - Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

21 March 2025 - AITER: AI Tensor Engine For ROCm

14 March 2025 - Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide

13 March 2025 - Optimized ROCm Docker for Distributed AI Training

06 February 2025 - GEMM Kernel Optimization For AMD GPUs

Posts by Anton Smirnov

16 April 2024 - Programming AMD GPUs with Julia

Posts by Antti Virtanen

18 June 2025 - Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation

Posts by Antti-Ville Suni

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Anuya Welling

07 April 2026 - Serving CTR Recommendation Models with Triton Inference Server using the ONNX Runtime Backend

06 April 2026 - FlashInfer on ROCm: High‑Throughput Prefill Attention via AITER

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

02 October 2025 - From Ingestion to Inference: RAG Pipelines on AMD GPUs

20 August 2025 - DGL in the Real World: Running GNNs on Real Use Cases

31 July 2025 - Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware

Posts by Aravind Kumar Rao Bappanadu

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Arseny Moskvichev

28 June 2024 - Mamba on AMD GPUs with ROCm

Posts by Arttu Niemela

06 March 2026 - HPC Coding Agent - Part 3: MCP Tool for Profiling

25 November 2025 - Using Reinforcement Learning to Fix Text in AI-Generated Videos

19 August 2025 - Wan2.2 Fine-Tuning: Tailoring an Advanced Video Generation Model on a Single GPU

Posts by Ashish Sirasao

11 June 2026 - Productionizing TurboQuant on AMD GPUs for KV-Cache-Bound LLM Inference

11 June 2026 - Low Kruskal-Rank Adaptation

20 May 2026 - QuickReduce FP4 Quantization and Benchmarking on MI355

25 March 2026 - Programming Tensor Descriptors in Composable Kernel (CK)

24 March 2026 - Engineering Qwen-VL for Production: Vision Module Architecture and Optimization Practices

19 March 2026 - hipBLASLt Online GEMM Tuning

17 February 2026 - Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression

05 November 2025 - Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script

29 October 2025 - High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

26 August 2025 - QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

Posts by Asitav Mishra

01 June 2026 - Performance Profiling on AMD GPUs - Part 4: Fortran OpenMP Offload Edition

23 October 2025 - Performance Profiling on AMD GPUs - Part 3: Advanced Usage

13 August 2025 - Performance Profiling on AMD GPUs – Part 2: Basic Usage

26 June 2025 - Performance Profiling on AMD GPUs – Part 1: Foundations

13 May 2024 - Reading AMD GPU ISA

15 September 2023 - Jacobi Solver with HIP and OpenMP offloading

Posts by Babak Poursartip

28 February 2025 - Measuring Max-Achievable FLOPs – Part 2

Posts by Baiqiang Xia

10 November 2025 - Training AI Weather Forecasting Models on AMD Instinct

18 September 2025 - Running SOTA AI-based Weather Forecasting models on AMD Instinct

Posts by Balazs Toth

19 August 2025 - Wan2.2 Fine-Tuning: Tailoring an Advanced Video Generation Model on a Single GPU

Posts by Barsoum Emad

01 June 2026 - Out-of-the-Box ROLL Support on AMD GPUs: Accelerating Reinforcement Learning at Scale

Posts by Ben Sander

28 February 2025 - Measuring Max-Achievable FLOPs – Part 2

14 February 2025 - Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1

Posts by Benran Hu

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

Posts by Bill He

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

28 August 2025 - Unleashing AMD Instinct™ MI300X GPUs for LLM Serving: Disaggregating Prefill & Decode with SGLang

28 April 2025 - Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart

Posts by Bin Ding

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

08 December 2025 - Accelerating Autonomous Driving Model Training on AMD ROCm™ Software

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Posts by Bishwo Adhikari

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Bo Zhang

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Bob Robey

26 April 2024 - Application portability with HIP

16 April 2024 - Affinity part 2 - System topology and controlling affinity

16 April 2024 - Affinity part 1 - Affinity, placement, and order

Posts by Bobo Fang

14 May 2026 - Further Accelerating Kimi-K2.5 on AMD Instinct™ MI325X: W4A8 & W8A8 Quantization with AMD Quark

24 March 2026 - Accelerating Kimi-K2.5 on AMD Instinct™ MI300X: Optimizing Fused MoE with FlyDSL

30 January 2026 - Debugging NaN Results in CK Tile GEMM: A rocgdb Detective Story

25 July 2025 - Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

Posts by Bowen Bao

11 June 2026 - Productionizing TurboQuant on AMD GPUs for KV-Cache-Bound LLM Inference

14 May 2026 - Further Accelerating Kimi-K2.5 on AMD Instinct™ MI325X: W4A8 & W8A8 Quantization with AMD Quark

17 February 2026 - Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression

29 October 2025 - High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

Posts by Brayden Mahdavi

17 November 2025 - AMD Enterprise AI Suite: Open Infrastructure for Production AI

Posts by Brian Cornille

13 November 2024 - Introducing AMD’s Next-Gen Fortran Compiler

Posts by Brian Pickrell

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

Posts by Bruce Xue

16 May 2025 - Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang

Posts by Carlus Huang

07 May 2026 - vLLM-ATOM: Unlocking Native AMD Performance in the vLLM Ecosystem

20 April 2026 - Getting Started with FlyDSL Nightly Wheels on ROCm

20 February 2026 - FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs

17 February 2026 - Adaptive Top-K Selection: Eliminating Performance Cliffs Across All K Values on AMD GPUs

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

30 September 2025 - Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture

21 March 2025 - AITER: AI Tensor Engine For ROCm

Posts by Carmine Zaccagnino

24 April 2026 - Styled Text Image Generation with Eruku on AMD

Posts by Carson Liao

06 April 2026 - Customizing Kernels with hipBLASLt TensileLite GEMM Tuning - Advanced User Guide

17 February 2026 - Unlocking Sparse Acceleration on AMD GPUs with hipSPARSELt

05 November 2025 - Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script

09 October 2025 - GEMM Tuning within hipBLASLt– Part 2

05 September 2025 - GEMM Tuning within hipBLASLt - Part 1

Posts by Chaitanya Manem

06 December 2025 - Building a State-of-the-Art 32 Billion Reasoning Model with Only Synthetic Data on AMD GPUs

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

Posts by Chandan Sharma

07 October 2025 - Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs

Posts by Chandra Yang

08 January 2026 - Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms

Posts by Chang Liu

11 September 2025 - Efficient LLM Serving with MTP: DeepSeek V3 and SGLang on AMD Instinct GPUs

24 March 2025 - Speculative Decoding - Deep Dive

Posts by Chao Li

11 June 2026 - Low Kruskal-Rank Adaptation

19 March 2026 - hipBLASLt Online GEMM Tuning

05 November 2025 - Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script

Posts by Chao Xu

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Posts by Chaojun Hou

23 February 2026 - Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

08 February 2026 - Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs

16 December 2025 - MoE Training Best Practices on AMD GPUs

04 November 2025 - Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training

Posts by Charles Boyd

19 February 2026 - Introducing hipThreads: A C++ - Style Concurrency Library for AMD GPUs

Posts by Charles Yang

06 October 2025 - Optimizing FP4 Mixed-Precision Inference with Petit on AMD Instinct MI250 and MI300 GPUs: A Developer’s Perspective

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

17 July 2025 - Vibe Coding Pac-Man Inspired Game with DeepSeek-R1 and AMD Instinct MI300X

Posts by Chelsea Iluno

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

Posts by Cheng Ling

31 May 2024 - SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel

Posts by Cheng Yao

24 April 2026 - Primus Projection: Estimate Memory and Performance Before You Train

23 February 2026 - Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

16 December 2025 - MoE Training Best Practices on AMD GPUs

Posts by Chengjia Huang

25 May 2026 - AI Inference on AMD Ryzen™ AI Max Processor

Posts by Chia Hung

09 October 2025 - GEMM Tuning within hipBLASLt– Part 2

Posts by Chris Sosa

20 October 2025 - ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System

Posts by Christian Gilli

27 May 2026 - Deep Dive Into 4-Wave Interleave FP8 GEMM

Posts by Christophe Paquot

28 May 2025 - HIP 7.0 Is Coming: What You Need to Know to Stay Ahead

Posts by Chuan Li

07 May 2026 - vLLM-ATOM: Unlocking Native AMD Performance in the vLLM Ecosystem

Posts by Chun Fang

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Chunhung Wang

14 May 2026 - Further Accelerating Kimi-K2.5 on AMD Instinct™ MI325X: W4A8 & W8A8 Quantization with AMD Quark

06 April 2026 - Customizing Kernels with hipBLASLt TensileLite GEMM Tuning - Advanced User Guide

24 March 2026 - Accelerating Kimi-K2.5 on AMD Instinct™ MI300X: Optimizing Fused MoE with FlyDSL

17 February 2026 - Adaptive Top-K Selection: Eliminating Performance Cliffs Across All K Values on AMD GPUs

30 January 2026 - Debugging NaN Results in CK Tile GEMM: A rocgdb Detective Story

05 November 2025 - Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script

25 July 2025 - Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

Posts by Claire Lee

02 March 2026 - Streamlining Recommendation Model Training on AMD Instinct™ GPUs

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

Posts by Clement Lin

14 May 2026 - Further Accelerating Kimi-K2.5 on AMD Instinct™ MI325X: W4A8 & W8A8 Quantization with AMD Quark

06 April 2026 - Customizing Kernels with hipBLASLt TensileLite GEMM Tuning - Advanced User Guide

24 March 2026 - Accelerating Kimi-K2.5 on AMD Instinct™ MI300X: Optimizing Fused MoE with FlyDSL

17 February 2026 - Adaptive Top-K Selection: Eliminating Performance Cliffs Across All K Values on AMD GPUs

30 January 2026 - Debugging NaN Results in CK Tile GEMM: A rocgdb Detective Story

25 July 2025 - Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

Posts by Clint Greene

01 October 2025 - Enabling FlashInfer on ROCm for Accelerated LLM Serving

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

11 July 2025 - Accelerating Video Generation on ROCm with Unified Sequence Parallelism: A Practical Guide

12 June 2025 - Aligning Mixtral 8x7B with TRL on AMD GPUs

09 October 2024 - Supercharging JAX with Triton Kernels on AMD GPUs

23 September 2024 - Fine-tuning Llama 3 with Axolotl using ROCm on AMD GPUs

19 September 2024 - Inferencing and serving with vLLM on AMD GPUs

19 September 2024 - Enhancing vLLM Inference on AMD GPUs

15 May 2024 - Accelerating Large Language Models with Flash Attention on AMD GPUs

01 May 2024 - Inferencing with Mixtral 8x22B on AMD GPUs

16 April 2024 - Speech-to-Text on an AMD GPU with Whisper

15 April 2024 - Developing Triton Kernels on AMD GPUs

04 April 2024 - Retrieval Augmented Generation (RAG) using LlamaIndex

26 January 2024 - Accelerating XGBoost with Dask using multiple AMD GPUs

Posts by Corbin Robeck

13 May 2024 - Reading AMD GPU ISA

Posts by Dai Yan

24 March 2026 - Accelerating Kimi-K2.5 on AMD Instinct™ MI300X: Optimizing Fused MoE with FlyDSL

Posts by Damon McDougall

08 June 2023 - GPU-aware MPI with ROCm

14 November 2022 - AMD matrix cores

Posts by Daniel Gustafsson

03 June 2026 - Adapting AIM LLMs For Specific Use Cases Through Fine-Tuning in AMD AI Workbench

02 April 2026 - Deploy and Customize AMD Solution Blueprints

31 March 2026 - Leveraging AMD AI Workbench and Autoscaling to Scale LLM Inference for Optimal Resource Utilization

24 February 2026 - Getting Started with AMD Resource Manager: Efficient Sharing of AMD Instinct™ GPUs for R&D Teams and AI Practitioners

19 December 2025 - Getting Started with AMD AI Workbench: Deploying and Managing AI Workloads

Posts by Daniel Huang

25 May 2026 - AI Inference on AMD Ryzen™ AI Max Processor

09 March 2026 - Getting Started with ComfyUI on AMD Radeon™ RX 9000 Series GPUs

20 January 2026 - Quickly Developing Powerful Flash Attention Using TileLang on AMD Instinct MI300X GPU

25 August 2025 - AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs

Posts by Daniel Mcintosh

19 February 2026 - Introducing hipThreads: A C++ - Style Concurrency Library for AMD GPUs

Posts by Daniel Velicka

14 November 2022 - AMD matrix cores

Posts by Daniel Warna

10 November 2025 - Training AI Weather Forecasting Models on AMD Instinct

18 September 2025 - Running SOTA AI-based Weather Forecasting models on AMD Instinct

Posts by Danny Guan

16 September 2025 - ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity

11 April 2025 - ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver

Posts by David Björelind

24 March 2026 - GROMACS on AMD Instinct GPUs: A Complete Build Guide

13 March 2026 - GROMACS Performance on AMD Instinct MI355X

14 January 2026 - Applying Compute Partitioning for Workloads on MI300X GPUs

07 October 2025 - Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection

19 September 2025 - Optimizing Drug Discovery Tools on AMD MI300X Part 1: Molecular Design with REINVENT

Posts by David Doscher

26 January 2023 - AMD ROCm™ installation

Posts by David Li

25 July 2025 - Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

21 May 2025 - From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile

15 April 2025 - Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs

Posts by David Limpus*

11 June 2026 - Productionizing TurboQuant on AMD GPUs for KV-Cache-Bound LLM Inference

Posts by David Prescott

24 February 2026 - Getting Started with AMD Resource Manager: Efficient Sharing of AMD Instinct™ GPUs for R&D Teams and AI Practitioners

Posts by David Silverstone

22 January 2026 - LLM Inference Optimization Using AMD GPU Partitioning

Posts by Debasis Mandal

11 May 2026 - Accelerating ComfyUI Workflows on AMD Instinct™ MI355X GPUs with ROCm

07 April 2026 - Serving CTR Recommendation Models with Triton Inference Server using the ONNX Runtime Backend

06 April 2026 - FlashInfer on ROCm: High‑Throughput Prefill Attention via AITER

01 October 2025 - Enabling FlashInfer on ROCm for Accelerated LLM Serving

Posts by Deeksha Goplani

16 December 2025 - 3D Scene Reconstruction from the Inside: Explore the Mathematics Behind gsplat

03 October 2025 - Elevating 3D Scene Rendering with GSplat

18 July 2025 - Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs

18 July 2025 - Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

Posts by Deepan Sekar

11 December 2025 - Accelerating llama.cpp on AMD Instinct MI300X

09 September 2025 - Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

Posts by Denny Iriawan

28 May 2025 - HIP 7.0 Is Coming: What You Need to Know to Stay Ahead

Posts by Deval Shah

05 May 2026 - Accelerating Mixture-of-Experts Execution with FarSkip-Collective Models

27 April 2026 - TraceLens: Democratizing AI Performance Analysis

Posts by Deval Shah

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

Posts by Devang Patel

24 April 2026 - Primus Projection: Estimate Memory and Performance Before You Train

Posts by Dewei Wang

20 April 2026 - Getting Started with FlyDSL Nightly Wheels on ROCm

20 February 2026 - FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs

Posts by Di Tian

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

Posts by Diptorup Deb

06 April 2026 - FlashInfer on ROCm: High‑Throughput Prefill Attention via AITER

01 October 2025 - Enabling FlashInfer on ROCm for Accelerated LLM Serving

Posts by Dominic Widdows

06 February 2026 - Accelerating Graph Layout with AI and ROCm on AMD GPUs

20 October 2025 - ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System

07 October 2025 - Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs

24 July 2025 - Benchmarking Reasoning Models: From Tokens to Answers

Posts by Dong Li

11 June 2026 - Low Kruskal-Rank Adaptation

01 June 2026 - Out-of-the-Box ROLL Support on AMD GPUs: Accelerating Reinforcement Learning at Scale

29 May 2026 - Enabling Speculative Speculative Decoding on MI300X

20 April 2026 - FLy: A New Paradigm for Speculative Decoding — Accepting Semantically Correct Drafts Beyond Exact Match

05 February 2026 - Micro-World: First AMD Open-Source World Models for Interactive Video Generation

22 January 2026 - Nitro-AR: A Compact AR Transformer for High-Quality Image Generation

12 January 2026 - Athena-PRM: Enhancing Multimodal Reasoning with Data-Efficient Process Reward Models

08 January 2026 - Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms

07 January 2026 - Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation

02 January 2026 - SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

08 December 2025 - Accelerating Autonomous Driving Model Training on AMD ROCm™ Software

03 December 2025 - Týr-the-Pruner: Search-based Global Structural Pruning for LLMs

24 October 2025 - Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation

14 October 2025 - Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

22 August 2025 - Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning

03 August 2025 - AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Posts by Dong Zhou

05 February 2026 - Micro-World: First AMD Open-Source World Models for Interactive Video Generation

22 January 2026 - Nitro-AR: A Compact AR Transformer for High-Quality Image Generation

08 January 2026 - Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms

24 October 2025 - Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation

Posts by Dong zhou

03 August 2025 - AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation

Posts by Doug Lehr

21 January 2026 - ROCm Becomes a First-Class Platform in the vLLM Ecosystem

26 August 2025 - QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

Posts by Douglas Hamilton

22 May 2025 - ROCm Runfile Installer Is Here!

Posts by Douglas Jia

15 October 2024 - Multinode Fine-Tuning of Stable Diffusion XL on AMD GPUs with Hugging Face Accelerate and OCI’s Kubernetes Engine (OKE)

03 October 2024 - Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch

06 September 2024 - Optimize GPT Training: Enabling Mixed Precision Training in JAX using ROCm on AMD GPUs

22 July 2024 - Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs

02 July 2024 - A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs

24 April 2024 - Transforming Words into Motion: A Guide to Video Generation with AMD GPU

17 April 2024 - Inferencing with AI2’s OLMo model on AMD GPU

16 April 2024 - Instruction fine-tuning of StarCoder with PEFT on multiple AMD GPUs

11 April 2024 - GPU Unleashed: Training Reinforcement Learning Agents with Stable Baselines3 on an AMD GPU in Gymnasium Environment

23 February 2024 - Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs

25 January 2024 - LLM distributed supervised fine-tuning with JAX

24 January 2024 - Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs

24 January 2024 - Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs

24 January 2024 - Efficient deployment of large language models with Text Generation Inference on AMD GPUs

Posts by Duyi Wang

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

Posts by Ean Garvey

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

Posts by Eda Zhou

23 March 2026 - Edge-to-Cloud Robotics with AMD ROCm: From Data Collection to Real-Time Inference

Posts by Eduardo Alvarez

14 March 2025 - Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance

Posts by Elaine Zosa

18 June 2025 - Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation

Posts by Eli Uriegas

21 October 2025 - Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring

Posts by Eliecer Diaz

02 April 2026 - Deploy and Customize AMD Solution Blueprints

Posts by Eliot Li

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

13 February 2026 - Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

11 December 2025 - Accelerating llama.cpp on AMD Instinct MI300X

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

14 November 2025 - Plug-and-Play CuPy on ROCm: Data Analytics Acceleration Made Simple

13 November 2025 - Accelerating Vector Search: hipVS and hipRAFT on AMD

12 November 2025 - Technical Dive into AMD MLPerf Training v5.1 Submission

12 November 2025 - Reproducing AMD MLPerf Training v5.1 Submission Result

02 October 2025 - From Ingestion to Inference: RAG Pipelines on AMD GPUs

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

09 September 2025 - Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

30 May 2025 - Scale LLM Inference with Multi-Node Infrastructure

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

28 August 2024 - Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission

21 August 2024 - Performing natural language processing tasks with LLMs on ROCm running on AMD GPUs

09 August 2024 - Inferencing with Grok-1 on AMD GPUs

04 April 2024 - Image classification using Vision Transformer with AMD GPUs

01 April 2024 - Scale AI applications with Ray

Posts by Emad Barsoum

11 June 2026 - Low Kruskal-Rank Adaptation

10 June 2026 - Dropless MoE Training in JAX with Primus-Turbo

29 May 2026 - Enabling Speculative Speculative Decoding on MI300X

22 May 2026 - From Naive to Near-Peak: Building High-Performance GEMM Kernels with Gluon

07 May 2026 - vLLM-ATOM: Unlocking Native AMD Performance in the vLLM Ecosystem

05 May 2026 - Accelerating Mixture-of-Experts Execution with FarSkip-Collective Models

20 April 2026 - Getting Started with FlyDSL Nightly Wheels on ROCm

20 April 2026 - FLy: A New Paradigm for Speculative Decoding — Accepting Semantically Correct Drafts Beyond Exact Match

24 February 2026 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

20 February 2026 - FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs

05 February 2026 - Micro-World: First AMD Open-Source World Models for Interactive Video Generation

22 January 2026 - Nitro-AR: A Compact AR Transformer for High-Quality Image Generation

12 January 2026 - Athena-PRM: Enhancing Multimodal Reasoning with Data-Efficient Process Reward Models

08 January 2026 - Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms

07 January 2026 - Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation

02 January 2026 - SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

08 December 2025 - Accelerating Autonomous Driving Model Training on AMD ROCm™ Software

06 December 2025 - Building a State-of-the-Art 32 Billion Reasoning Model with Only Synthetic Data on AMD GPUs

03 December 2025 - Týr-the-Pruner: Search-based Global Structural Pruning for LLMs

24 October 2025 - Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation

14 October 2025 - Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

17 September 2025 - AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models

22 August 2025 - Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

03 August 2025 - AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

28 April 2025 - Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

31 January 2025 - Enhancing AI Training with AMD ROCm Software

Posts by Emelie Wahlstrom

04 November 2025 - Retrieval Augmented Generation (RAG) with vLLM, LangChain and Chroma

Posts by Ephrem Wu

07 January 2026 - Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation

Posts by Ethan Lin

06 April 2026 - Customizing Kernels with hipBLASLt TensileLite GEMM Tuning - Advanced User Guide

Posts by Ethan Yang

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

Posts by Evan Masters

28 February 2025 - Measuring Max-Achievable FLOPs – Part 2

Posts by Eveline Chen

14 May 2026 - Further Accelerating Kimi-K2.5 on AMD Instinct™ MI325X: W4A8 & W8A8 Quantization with AMD Quark

24 March 2026 - Accelerating Kimi-K2.5 on AMD Instinct™ MI300X: Optimizing Fused MoE with FlyDSL

Posts by Fabio Quattrini

24 April 2026 - Styled Text Image Generation with Eruku on AMD

Posts by Fabricio Flores

22 May 2026 - From Build to Benchmark: ONNX Model Serving with Triton Inference Server on AMD GPUs

13 November 2025 - Accelerating Vector Search: hipVS and hipRAFT on AMD

02 October 2025 - From Ingestion to Inference: RAG Pipelines on AMD GPUs

20 June 2025 - Enabling Real-Time Context for LLMs: Model Context Protocol (MCP) on AMD GPUs

07 May 2025 - DataFrame Acceleration: hipDF and hipDF.pandas on AMD GPUs

06 May 2025 - CuPy and hipDF on AMD: The Basics and Beyond

09 April 2025 - Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel

23 March 2025 - Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs

19 February 2025 - Fine-tuning Phi-3.5-mini LLM at scale: Harnessing Accelerate and Slurm for multinode training

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

24 October 2024 - Torchtune on AMD GPUs How-To Guide: Fine-tuning and Scaling LLMs with Multi-GPU Power

19 August 2024 - Using AMD GPUs for Enhanced Time Series Forecasting with Transformers

29 July 2024 - Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD

27 June 2024 - Fine-tuning and Testing Cutting-Edge Speech Models using ROCm on AMD GPUs

18 June 2024 - TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs

07 May 2024 - AMD in Action: Unveiling the Power of Application Tracing and Profiling

01 May 2024 - Step-by-Step Guide to Use OpenLLM on AMD GPUs

04 April 2024 - Building semantic search with SentenceTransformers on AMD

Posts by Faisal Azhar

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Fan Wang

08 December 2025 - Accelerating Autonomous Driving Model Training on AMD ROCm™ Software

Posts by Fan Wu

16 October 2025 - Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs

06 May 2025 - Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

Posts by Farshad Ghodsian

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

28 March 2025 - What’s New in the AMD GPU Operator v1.2.0 Release

29 January 2025 - Announcing the AMD GPU Operator and Metrics Exporter

Posts by Felix Li

20 April 2026 - Getting Started with FlyDSL Nightly Wheels on ROCm

24 March 2026 - Accelerating Kimi-K2.5 on AMD Instinct™ MI300X: Optimizing Fused MoE with FlyDSL

20 February 2026 - FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs

Posts by Felix Marty

25 March 2026 - Programming Tensor Descriptors in Composable Kernel (CK)

17 February 2026 - Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression

29 October 2025 - High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

Posts by Frank Wang

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

Posts by Fulu Li

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

Posts by Fuwei Yang

01 June 2026 - Out-of-the-Box ROLL Support on AMD GPUs: Accelerating Reinforcement Learning at Scale

08 December 2025 - Accelerating Autonomous Driving Model Training on AMD ROCm™ Software

Posts by Gabriel Weisz

27 April 2026 - TraceLens: Democratizing AI Performance Analysis

Posts by Ganesh Dasika

27 March 2025 - Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding

09 February 2025 - Deep dive into the MI300 compute and memory partition modes

Posts by Garrett Byrd

14 April 2025 - Installing ROCm from source with Spack

Posts by Gene Su

02 March 2026 - Streamlining Recommendation Model Training on AMD Instinct™ GPUs

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

Posts by Geoffrey C. Martin-Noble

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

Posts by George Markomanolis

16 April 2024 - Affinity part 2 - System topology and controlling affinity

16 April 2024 - Affinity part 1 - Affinity, placement, and order

Posts by George Wang

25 May 2026 - AI Inference on AMD Ryzen™ AI Max Processor

09 March 2026 - Getting Started with ComfyUI on AMD Radeon™ RX 9000 Series GPUs

20 January 2026 - Quickly Developing Powerful Flash Attention Using TileLang on AMD Instinct MI300X GPU

15 January 2026 - Deep Dive into Primus: High-Performance Training for Large Language Models

16 October 2025 - Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs

06 October 2025 - Optimizing FP4 Mixed-Precision Inference with Petit on AMD Instinct MI250 and MI300 GPUs: A Developer’s Perspective

04 September 2025 - Step-3 Deployment Simplified: A Day 0 Developer’s Guide on AMD Instinct™ GPUs

25 August 2025 - AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

25 July 2025 - Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

17 July 2025 - Vibe Coding Pac-Man Inspired Game with DeepSeek-R1 and AMD Instinct MI300X

18 June 2025 - Fine-Tuning LLMs with GRPO on AMD MI300X: Scalable RLHF with Hugging Face TRL and ROCm

21 May 2025 - From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile

16 May 2025 - Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang

15 May 2025 - Step-Video-T2V Inference with xDiT on AMD Instinct MI300X GPUs

06 May 2025 - Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

15 April 2025 - Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs

10 April 2025 - Unlock Peak Performance on AMD GPUs with Triton Kernel Optimizations

06 February 2025 - GEMM Kernel Optimization For AMD GPUs

Posts by Gerardo del Muro Gonzalez

24 March 2026 - GROMACS on AMD Instinct GPUs: A Complete Build Guide

Posts by Giacomo Capodaglio

23 October 2025 - Performance Profiling on AMD GPUs - Part 3: Advanced Usage

13 August 2025 - Performance Profiling on AMD GPUs – Part 2: Basic Usage

26 June 2025 - Performance Profiling on AMD GPUs – Part 1: Foundations

Posts by Gilbert Lee

02 March 2025 - Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X

Posts by Gilbert Lei

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

Posts by Gina Sitaraman

01 June 2026 - Performance Profiling on AMD GPUs - Part 4: Fortran OpenMP Offload Edition

10 April 2026 - Introduction to profiling tools for AMD hardware

23 October 2025 - Performance Profiling on AMD GPUs - Part 3: Advanced Usage

13 August 2025 - Performance Profiling on AMD GPUs – Part 2: Basic Usage

26 June 2025 - Performance Profiling on AMD GPUs – Part 1: Foundations

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

26 April 2024 - Application portability with HIP

16 April 2024 - Affinity part 2 - System topology and controlling affinity

16 April 2024 - Affinity part 1 - Affinity, placement, and order

09 March 2023 - AMD Instinct™ MI200 GPU memory space overview

14 November 2022 - AMD matrix cores

Posts by Giuseppe Franco

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

Posts by Gowtham Ramesh

24 February 2026 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

Posts by Graham Schelle

09 February 2026 - Building Robotics Applications with Ryzen AI and ROS 2

14 July 2025 - Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot

Posts by Grant Pinkert

14 November 2025 - Plug-and-Play CuPy on ROCm: Data Analytics Acceleration Made Simple

Posts by Gregory Shtrasberg

21 January 2026 - ROCm Becomes a First-Class Platform in the vLLM Ecosystem

Posts by Guanchen Li

11 June 2026 - Low Kruskal-Rank Adaptation

20 April 2026 - FLy: A New Paradigm for Speculative Decoding — Accepting Semantically Correct Drafts Beyond Exact Match

02 January 2026 - SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

03 December 2025 - Týr-the-Pruner: Search-based Global Structural Pruning for LLMs

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

Posts by Guihong Li

05 May 2026 - Accelerating Mixture-of-Experts Execution with FarSkip-Collective Models

17 September 2025 - AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models

Posts by Gulsum Gudukbay Akbulut

06 January 2026 - ROCm MaxText Testing — Decoupled (Offline) and Cloud-Integrated Modes

06 January 2026 - ROCm Fork of MaxText: Structure and Strategy

Posts by Haani Ahmed

15 May 2026 - Semantic Fencing of Video Streams Using Embedding Splits from Vision Foundation Models

Posts by Hai Xiao

21 March 2025 - Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

13 November 2024 - SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs

Posts by Haishuo Kong

08 February 2026 - Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs

Posts by Han Lin

19 March 2026 - hipBLASLt Online GEMM Tuning

05 November 2025 - Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script

Posts by Han Wang

07 January 2026 - Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation

Posts by Hang Yang

25 March 2026 - Programming Tensor Descriptors in Composable Kernel (CK)

Posts by Hao Chen

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

Posts by Haocong Wang

25 July 2025 - Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

21 May 2025 - From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile

Posts by Haohui Mai

06 October 2025 - Optimizing FP4 Mixed-Precision Inference with Petit on AMD Instinct MI250 and MI300 GPUs: A Developer’s Perspective

Posts by Haoyang Li

20 May 2026 - QuickReduce FP4 Quantization and Benchmarking on MI355

29 October 2025 - High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

26 August 2025 - QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

Posts by Hari Nair

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Harry Souris

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Hattie Wu

07 May 2026 - vLLM-ATOM: Unlocking Native AMD Performance in the vLLM Ecosystem

Posts by He Cui

08 January 2026 - Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms

03 August 2025 - AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation

Posts by Henry Ho

06 April 2026 - Customizing Kernels with hipBLASLt TensileLite GEMM Tuning - Advanced User Guide

28 February 2025 - Measuring Max-Achievable FLOPs – Part 2

Posts by HongTao Meng

02 March 2026 - Streamlining Recommendation Model Training on AMD Instinct™ GPUs

Posts by Hongxia Yang

20 April 2026 - Getting Started with FlyDSL Nightly Wheels on ROCm

24 February 2026 - PyTorch Offline Tuning with TunableOp

20 February 2026 - FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs

21 January 2026 - ROCm Becomes a First-Class Platform in the vLLM Ecosystem

02 January 2026 - Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models

24 November 2025 - The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism

21 October 2025 - Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

28 June 2025 - Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm

Posts by Hongyi Yao

07 January 2026 - Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation

Posts by Huanxuan Liao

02 January 2026 - SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

Posts by Huasha Zhao

10 June 2026 - Dropless MoE Training in JAX with Primus-Turbo

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

Posts by Hui Liu

13 November 2024 - SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs

Posts by Hyukjoon Lee

07 July 2025 - vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance

Posts by Hyunji Kim

23 October 2025 - STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm

Posts by Inesh Chakrabarti*

11 June 2026 - Productionizing TurboQuant on AMD GPUs for KV-Cache-Bound LLM Inference

Posts by Ish Kool

08 January 2026 - Using Gradient Boosting Libraries on MI300X for Financial Risk Prediction

16 December 2025 - 3D Scene Reconstruction from the Inside: Explore the Mathematics Behind gsplat

03 October 2025 - Elevating 3D Scene Rendering with GSplat

18 July 2025 - Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs

18 July 2025 - Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

Posts by Jaakko Vainio

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Jagadish Krishnamoorthy

21 October 2025 - Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring

Posts by James E. T. Smith

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

20 August 2025 - DGL in the Real World: Running GNNs on Real Use Cases

Posts by Janet Tseng

20 October 2025 - ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System

Posts by Jared Bowden

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

Posts by Jarkko Lehtiranta

18 March 2026 - Multi-Node Distributed Inference for Diffusion Models with xDiT

Posts by Jason Furmanek

22 May 2026 - From Naive to Near-Peak: Building High-Performance GEMM Kernels with Gluon

Posts by Jassani Adeem

28 June 2024 - Mamba on AMD GPUs with ROCm

Posts by Jayacharan Kolla

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

02 March 2025 - Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X

Posts by Jeff Daily

24 February 2026 - PyTorch Offline Tuning with TunableOp

21 October 2025 - Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring

Posts by Jehandad Khan

24 February 2026 - JAX-AITER: Bringing AMD’s Optimized AI Kernels to JAX on ROCm™

06 January 2026 - ROCm MaxText Testing — Decoupled (Offline) and Cloud-Integrated Modes

06 January 2026 - ROCm Fork of MaxText: Structure and Strategy

Posts by Jeremy Arnold

28 August 2024 - Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission

Posts by Jesus Carabano Bravo

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

Posts by Ji Liu

03 December 2025 - Týr-the-Pruner: Search-based Global Structural Pruning for LLMs

Posts by Jiahao Zhou

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

Posts by Jiahui Cao

10 March 2026 - FP8 GEMM Optimization on AMD CDNA™4 Architecture

Posts by Jialian Wu

24 February 2026 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

Posts by Jiang Liu

24 February 2026 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

Posts by Jianghui Wang

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Posts by Jiangyong Ren

17 February 2026 - Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression

26 August 2025 - QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

Posts by Jin Pan

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

Posts by Jin Tao

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Jin Zhou

24 February 2026 - PyTorch Offline Tuning with TunableOp

Posts by Jingai Yu

22 January 2026 - Nitro-AR: A Compact AR Transformer for High-Quality Image Generation

24 October 2025 - Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation

09 July 2025 - Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day

Posts by Jingxian Wang

08 February 2026 - Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs

Posts by Jinze Li

20 April 2026 - FLy: A New Paradigm for Speculative Decoding — Accepting Semantically Correct Drafts Beyond Exact Match

14 October 2025 - Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More

Posts by Jithun Nair

21 October 2025 - Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring

Posts by Joaquin Rives Gambin

10 December 2025 - Medical Imaging on MI300X: SwinUNETR Inference Optimization

07 October 2025 - Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection

Posts by Joe Shajrawi

20 May 2025 - AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving

Posts by Johanna Malinen

18 March 2026 - Multi-Node Distributed Inference for Diffusion Models with xDiT

Posts by Johanna Potyka

13 November 2024 - Introducing AMD’s Next-Gen Fortran Compiler

Posts by Johanna Yang

04 March 2026 - HPC Coding Agent - Part 2: An MCP Tool for Code Optimization with OpenEvolve

03 December 2025 - HPC Coding Agent - Part 1: Combining GLM-powered Cline and RAG Using MCP

27 November 2025 - Exploring Gameplay Video Generation with Hunyuan-GameCraft

21 November 2025 - Inference with HunyuanWorld-Voyager on AMD Instinct GPUs

24 September 2025 - Accelerating Audio-Driven Video Generation: WAN2.2-S2V on AMD ROCm

19 August 2025 - All-in-One Video Editing with VACE on AMD Instinct GPUs

Posts by Jonathan Burdge

18 June 2025 - Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation

Posts by Jorge Parada

22 May 2026 - From Build to Benchmark: ONNX Model Serving with Triton Inference Server on AMD GPUs

30 May 2025 - Scale LLM Inference with Multi-Node Infrastructure

Posts by Joseph Schoonover

14 April 2025 - Installing ROCm from source with Spack

Posts by Joshua Lu

09 February 2026 - Digital Twins on AMD: Building Robotic Simulations Using Edge AI PCs

Posts by Jouni Hartikainen

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Jouni Luoma

18 June 2025 - Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation

Posts by Joyce Zhang

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

Posts by Ju Huang

01 June 2026 - Out-of-the-Box ROLL Support on AMD GPUs: Accelerating Reinforcement Learning at Scale

Posts by Juho Kerttula

27 November 2025 - Fine-Tune LLMs for Proteins with AMD Enterprise AI Suite

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Juho Vainio

19 December 2025 - Getting Started with AMD AI Workbench: Deploying and Managing AI Workloads

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Julia Jiang

28 May 2025 - HIP 7.0 Is Coming: What You Need to Know to Stay Ahead

Posts by Jun Chen

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

Posts by Jun Kang Chow

02 January 2026 - Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models

24 November 2025 - The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism

Posts by Jun Zhao

02 January 2026 - SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

Posts by Junyan Yang

05 November 2025 - Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script

Posts by Justin Chang

09 February 2025 - MI300A - Exploring the APU advantage

13 November 2024 - Introducing AMD’s Next-Gen Fortran Compiler

29 August 2024 - Seismic stencil codes - part 3

29 August 2024 - Seismic stencil codes - part 2

29 August 2024 - Seismic stencil codes - part 1

15 September 2023 - Jacobi Solver with HIP and OpenMP offloading

18 July 2023 - Finite difference method - Laplacian part 4

11 May 2023 - Finite difference method - Laplacian part 3

04 January 2023 - Finite difference method - Laplacian part 2

14 November 2022 - Finite difference method - Laplacian part 1

Posts by Justin Chu

09 February 2026 - Building Robotics Applications with Ryzen AI and ROS 2

23 October 2025 - STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm

Posts by Kai Hakala

18 June 2025 - Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation

Posts by Kailash Gogineni

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

Posts by Kajsa Arnold

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

Posts by Kang Liu

02 January 2026 - SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

Posts by Karan Verma

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

12 November 2025 - Technical Dive into AMD MLPerf Training v5.1 Submission

12 November 2025 - Reproducing AMD MLPerf Training v5.1 Submission Result

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

Posts by Karthik Kashyap Thatipamula

08 January 2026 - Using Gradient Boosting Libraries on MI300X for Financial Risk Prediction

16 December 2025 - 3D Scene Reconstruction from the Inside: Explore the Mathematics Behind gsplat

03 October 2025 - Elevating 3D Scene Rendering with GSplat

18 July 2025 - Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs

18 July 2025 - Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

Posts by Karthik Sangaiah

27 March 2025 - Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding

09 February 2025 - Deep dive into the MI300 compute and memory partition modes

Posts by Ke Wang

20 May 2026 - QuickReduce FP4 Quantization and Benchmarking on MI355

29 October 2025 - High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

26 August 2025 - QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

Posts by Keith Anderson

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

Posts by Kelvin Lui

19 February 2026 - Introducing hipThreads: A C++ - Style Concurrency Library for AMD GPUs

Posts by Ken O’Brien

14 July 2025 - Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot

Posts by Kenny Roche

21 January 2026 - ROCm Becomes a First-Class Platform in the vLLM Ecosystem

20 May 2025 - AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving

Posts by Kerwin Tsai

09 February 2026 - Digital Twins on AMD: Building Robotic Simulations Using Edge AI PCs

Posts by Kevin Chang

21 May 2025 - From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile

Posts by Kevin Joseph

13 November 2025 - Accelerating Vector Search: hipVS and hipRAFT on AMD

Posts by Kiran Thumma

20 April 2026 - Getting Started with FlyDSL Nightly Wheels on ROCm

Posts by Kristoffer Peyron

04 March 2026 - HPC Coding Agent - Part 2: An MCP Tool for Code Optimization with OpenEvolve

27 November 2025 - Exploring Gameplay Video Generation with Hunyuan-GameCraft

21 November 2025 - Inference with HunyuanWorld-Voyager on AMD Instinct GPUs

24 September 2025 - Accelerating Audio-Driven Video Generation: WAN2.2-S2V on AMD ROCm

19 August 2025 - All-in-One Video Editing with VACE on AMD Instinct GPUs

Posts by KuanTing Lin

08 January 2026 - Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms

Posts by Kumar Deepak

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

Posts by Kyle Wang

06 May 2025 - Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

Posts by Kyle Zhao

16 December 2025 - MoE Training Best Practices on AMD GPUs

Posts by Lalith Narasimhan

13 November 2025 - Accelerating Vector Search: hipVS and hipRAFT on AMD

Posts by Layla Frischman

20 May 2026 - ROCm 7.13: Expanding Hardware, Tools, and Reach

Posts by Lei Shao

09 August 2024 - Inferencing with Grok-1 on AMD GPUs

Posts by Lei Wei

08 February 2026 - Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs

04 November 2025 - Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training

Posts by Lei Zhang

08 February 2026 - Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs

06 May 2025 - Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

Posts by Levent Guner

28 November 2025 - VLM Fine-Tuning for Robotics on AMD Enterprise AI Suite

27 November 2025 - Fine-Tune LLMs for Proteins with AMD Enterprise AI Suite

Posts by Liam Berry

20 May 2026 - ROCm 7.13: Expanding Hardware, Tools, and Reach

05 November 2025 - Continuing the Momentum: Refining ROCm For The Next Wave Of AI and HPC

16 September 2025 - ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity

06 June 2025 - The ROCm Revisited Series

06 June 2025 - ROCm Revisited: Getting Started with HIP

06 June 2025 - ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem

22 May 2025 - ROCm Runfile Installer Is Here!

Posts by Lihuan Zhang

23 February 2026 - Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

16 December 2025 - MoE Training Best Practices on AMD GPUs

Posts by Lin Sun

22 May 2026 - From Build to Benchmark: ONNX Model Serving with Triton Inference Server on AMD GPUs

07 April 2026 - Serving CTR Recommendation Models with Triton Inference Server using the ONNX Runtime Backend

02 October 2025 - From Ingestion to Inference: RAG Pipelines on AMD GPUs

30 September 2025 - Coding Agents on AMD GPUs: Fast LLM Pipelines for Developers

Posts by Lin Zhao

17 February 2026 - Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression

29 October 2025 - High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

Posts by Lingpeng Jin

07 May 2026 - vLLM-ATOM: Unlocking Native AMD Performance in the vLLM Ecosystem

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

21 March 2025 - AITER: AI Tensor Engine For ROCm

Posts by Lixun Zhang

22 May 2026 - From Naive to Near-Peak: Building High-Performance GEMM Kernels with Gluon

Posts by Liying Li

10 June 2026 - Dropless MoE Training in JAX with Primus-Turbo

16 December 2025 - MoE Training Best Practices on AMD GPUs

Posts by Liz Li

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

23 February 2026 - Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

08 February 2026 - Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs

16 December 2025 - MoE Training Best Practices on AMD GPUs

04 November 2025 - Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training

19 September 2025 - An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs

22 August 2025 - Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

01 May 2025 - Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools

28 April 2025 - Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs

06 April 2025 - Power Up Llama 4 with AMD Instinct: A Developer’s Day 0 Quickstart

28 March 2025 - Bring FLUX to Life on MI300X: Run and Optimize with Hugging Face Diffusers

21 March 2025 - Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

21 March 2025 - AITER: AI Tensor Engine For ROCm

Posts by Logan Grado

03 July 2024 - Accelerating models on ROCm using PyTorch TunableOp

09 April 2024 - ResNet for image classification using AMD GPUs

01 April 2024 - Scale AI applications with Ray

29 March 2024 - Automatic mixed precision in PyTorch using AMD GPUs

Posts by Lorri Rao

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

Posts by Lovisa Borthas

11 February 2026 - Solution Blueprints: Accelerating AI Deployment with AMD Enterprise AI

Posts by Luise Chen

09 August 2024 - Inferencing with Grok-1 on AMD GPUs

Posts by Luka Stanisic

01 June 2026 - Performance Profiling on AMD GPUs - Part 4: Fortran OpenMP Offload Edition

23 October 2025 - Performance Profiling on AMD GPUs - Part 3: Advanced Usage

13 August 2025 - Performance Profiling on AMD GPUs – Part 2: Basic Usage

26 June 2025 - Performance Profiling on AMD GPUs – Part 1: Foundations

Posts by Luka Tsabadze

29 May 2026 - Running Variational Quantum Eigensolver with Qiskit Aer on AMD Instinct

06 March 2026 - Fine-Tuning AI Surrogate Models for Physics Simulations with Walrus on AMD Instinct GPU Accelerators

10 November 2025 - Training AI Weather Forecasting Models on AMD Instinct

18 September 2025 - Running SOTA AI-based Weather Forecasting models on AMD Instinct

Posts by Mahdi Ghodsi

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

17 July 2025 - Vibe Coding Pac-Man Inspired Game with DeepSeek-R1 and AMD Instinct MI300X

28 April 2025 - Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart

Posts by Mahdieh Ghazimirsaeed

08 June 2023 - GPU-aware MPI with ROCm

Posts by Manoj Rao

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

Posts by Marc Dillon

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Marco Grond

08 January 2026 - Using Gradient Boosting Libraries on MI300X for Financial Risk Prediction

03 October 2025 - Elevating 3D Scene Rendering with GSplat

18 July 2025 - Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs

18 July 2025 - Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

20 May 2025 - Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs

12 May 2025 - Accelerated JPEG decoding on AMD Instinct™ GPUs with rocJPEG

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

Posts by Maria Ruiz Varela

26 April 2024 - Application portability with HIP

09 March 2023 - AMD Instinct™ MI200 GPU memory space overview

Posts by Marilyn Basanta

16 September 2025 - ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity

Posts by Mario Reiser

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Mark Granroth Wilding

24 April 2026 - Styled Text Image Generation with Eruku on AMD

16 December 2025 - 3D Scene Reconstruction from the Inside: Explore the Mathematics Behind gsplat

03 October 2025 - Elevating 3D Scene Rendering with GSplat

Posts by Mark Granroth-Wilding

07 May 2026 - AMD-Powered 3D Gaussian Splatting for Autonomous Driving Scenes

Posts by Mark van Heeswijk

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Marko Savic

19 February 2026 - Introducing hipThreads: A C++ - Style Concurrency Library for AMD GPUs

Posts by Markus Hartikainen

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Martin Huarte

14 January 2025 - Boosting Computational Fluid Dynamics Performance with AMD Instinct™ MI300X

Posts by Mathias Lehtinen

17 November 2025 - AMD Enterprise AI Suite: Open Infrastructure for Production AI

Posts by Matt Elliott

21 February 2025 - How to Build a vLLM Container for Inference and Benchmarking

29 January 2025 - Announcing the AMD GPU Operator and Metrics Exporter

16 January 2025 - Getting started with AMD ROCm containers: from base images to custom solutions

17 September 2024 - Getting to Know Your GPU: A Deep Dive into AMD SMI

10 September 2024 - Introducing the AMD ROCm™ Offline Installer Creator: Simplifying Deployment for AI and HPC

Posts by Matthias Reso

21 July 2025 - Chain-of-Thought Guided Visual Reasoning Using Llama 3.2 on a Single AMD Instinct MI300X GPU

Posts by Matthieu Chan Chee

15 May 2026 - Semantic Fencing of Video Streams Using Embedding Splits from Vision Foundation Models

Posts by Matti Varjokallio

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Max Kiehn

15 May 2026 - Semantic Fencing of Video Streams Using Embedding Splits from Vision Foundation Models

Posts by Meena Arunachalam

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

12 November 2025 - Technical Dive into AMD MLPerf Training v5.1 Submission

12 November 2025 - Reproducing AMD MLPerf Training v5.1 Submission Result

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

28 August 2024 - Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission

Posts by Mehdi Rezagholizadeh

17 September 2025 - AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models

Posts by Mehdi Saeedi

31 March 2026 - Training a Robotic Arm Using MuJoCo and JAX on AMD Hardware with ROCm™

Posts by Menghsuan Yang

14 May 2026 - Further Accelerating Kimi-K2.5 on AMD Instinct™ MI325X: W4A8 & W8A8 Quantization with AMD Quark

24 March 2026 - Accelerating Kimi-K2.5 on AMD Instinct™ MI300X: Optimizing Fused MoE with FlyDSL

17 February 2026 - Adaptive Top-K Selection: Eliminating Performance Cliffs Across All K Values on AMD GPUs

30 January 2026 - Debugging NaN Results in CK Tile GEMM: A rocgdb Detective Story

25 July 2025 - Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

Posts by Mengmeng Ge

08 January 2026 - Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms

03 August 2025 - AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation

Posts by Michael Klemm

13 November 2024 - Introducing AMD’s Next-Gen Fortran Compiler

Posts by Michael Zhang

13 November 2024 - SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs

24 October 2024 - CTranslate2: Efficient Inference with Transformer Models on AMD GPUs

Posts by Miikael Leskinen

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Mika Koistinen

18 June 2025 - Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation

Posts by Mika Ranta

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Mikko Lauri

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

Posts by Mikko Tukiainen

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Mikko Vilenius

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Mingjie Lu

08 December 2025 - Accelerating Autonomous Driving Model Training on AMD ROCm™ Software

Posts by Mingyu Yang

17 September 2025 - AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models

Posts by Mingzhi Liu

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

Posts by Miro Hodak

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

12 November 2025 - Technical Dive into AMD MLPerf Training v5.1 Submission

12 November 2025 - Reproducing AMD MLPerf Training v5.1 Submission Result

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

28 August 2024 - Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission

Posts by Mittul Singh

12 January 2026 - Installing AMD HIP-Enabled GROMACS on HPC Systems: A LUMI Supercomputer Case Study

16 December 2025 - 3D Scene Reconstruction from the Inside: Explore the Mathematics Behind gsplat

Posts by Mohammad Abdul Basit

18 December 2025 - A Step-by-Step Walkthrough of Decentralized LLM Training on AMD GPUs

Posts by Mohammad Mahdi Kamani

28 April 2025 - Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs

Posts by Mohammed Faraaz Mustafa

20 May 2026 - ROCm 7.13: Expanding Hardware, Tools, and Reach

16 September 2025 - ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity

10 June 2025 - AMD ROCm: Powering the World’s Fastest Supercomputers

06 June 2025 - The ROCm Revisited Series

06 June 2025 - ROCm Revisited: Getting Started with HIP

Posts by Mohit Deopujari

13 February 2026 - Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

22 January 2026 - LLM Inference Optimization Using AMD GPU Partitioning

Posts by Mou Li

16 December 2025 - MoE Training Best Practices on AMD GPUs

Posts by Muhammad Osama

09 February 2025 - Deep dive into the MI300 compute and memory partition modes

29 July 2024 - Graph analytics on AMD GPUs using Gunrock

Posts by Mukhil Azhagan Mallaiyan Sathiaseelan

11 May 2026 - Accelerating ComfyUI Workflows on AMD Instinct™ MI355X GPUs with ROCm

07 April 2026 - Serving CTR Recommendation Models with Triton Inference Server using the ONNX Runtime Backend

06 April 2026 - FlashInfer on ROCm: High‑Throughput Prefill Attention via AITER

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

01 October 2025 - Enabling FlashInfer on ROCm for Accelerated LLM Serving

20 August 2025 - DGL in the Real World: Running GNNs on Real Use Cases

31 July 2025 - Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware

Posts by Mustafa Khalid Masood

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Neha Mathews

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

Posts by Nhat Vo

08 June 2026 - ORBIT-2 based Weather and Climate Downscaling and Downscaled Global Forecasts on AMD Instinct

19 March 2026 - Utilizing AMD Instinct GPU Accelerators for Weather and Precipitation Forecasting with NeuralGCM

Posts by Nicholas Curtis

17 May 2023 - Register pressure in AMD CDNA™2 GPUs

Posts by Nicholas Malaya

14 November 2022 - AMD matrix cores

Posts by Nick Romero

24 February 2026 - PyTorch Offline Tuning with TunableOp

21 October 2025 - Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring

Posts by Nico Holmberg

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Nicola Tan

17 November 2025 - AMD Enterprise AI Suite: Open Infrastructure for Production AI

Posts by Niels Zhang

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

Posts by Niko Ma

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

Posts by Niko Vuokko

07 May 2026 - AMD-Powered 3D Gaussian Splatting for Autonomous Driving Scenes

24 April 2026 - Styled Text Image Generation with Eruku on AMD

16 December 2025 - 3D Scene Reconstruction from the Inside: Explore the Mathematics Behind gsplat

28 November 2025 - VLM Fine-Tuning for Robotics on AMD Enterprise AI Suite

Posts by Niles Burbank

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

Posts by Ning Zhang

04 September 2025 - Step-3 Deployment Simplified: A Day 0 Developer’s Guide on AMD Instinct™ GPUs

10 April 2025 - Unlock Peak Performance on AMD GPUs with Triton Kernel Optimizations

06 February 2025 - GEMM Kernel Optimization For AMD GPUs

Posts by Nitish Bhat

13 January 2026 - Reimagining GPU Allocation in Kubernetes: Introducing the AMD GPU DRA Driver

Posts by No author

10 June 2024 - Stone Ridge Expands Reservoir Simulation Options with AMD Instinct™ Accelerators

16 May 2024 - Siemens taps AMD Instinct™ GPUs to expand high-performance hardware options for Simcenter STAR-CCM+

16 May 2024 - AMD Collaboration with the University of Michigan offers High Performance Open-Source Solutions to the Bioinformatics Community

Posts by Noah Monti

31 March 2026 - Training a Robotic Arm Using MuJoCo and JAX on AMD Hardware with ROCm™

Posts by Noah Wolfe

10 April 2026 - Introduction to profiling tools for AMD hardware

Posts by Noel Chalmers

08 June 2023 - GPU-aware MPI with ROCm

14 November 2022 - AMD matrix cores

Posts by Nowy Condro

31 March 2026 - Leveraging AMD AI Workbench and Autoscaling to Scale LLM Inference for Optimal Resource Utilization

Posts by Olga Miroshnichenko

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Olha Shkaravska

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Ossian O’Reilly

29 August 2024 - Seismic stencil codes - part 3

29 August 2024 - Seismic stencil codes - part 2

29 August 2024 - Seismic stencil codes - part 1

11 May 2023 - Finite difference method - Laplacian part 3

04 January 2023 - Finite difference method - Laplacian part 2

14 November 2022 - Finite difference method - Laplacian part 1

14 November 2022 - AMD matrix cores

Posts by Parsa Fashi

28 April 2025 - Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs

Posts by Paul Bauer

13 March 2026 - GROMACS Performance on AMD Instinct MI355X

12 January 2026 - Installing AMD HIP-Enabled GROMACS on HPC Systems: A LUMI Supercomputer Case Study

Posts by Paul Hartke

13 November 2025 - Democratizing AI Compute with AMD Using SkyPilot

Posts by Paul Mullowney

03 November 2023 - Sparse matrix vector multiplication - part 1

Posts by Pauli Pihajoki

20 May 2026 - Diffusion-based Atmospheric Downscaling on AMD Instinct GPUs

06 March 2026 - Ensemble High-Resolution Weather Forecasting on AMD Instinct GPU Accelerators

07 January 2026 - High-Resolution Weather Forecasting with StormCast on AMD Instinct GPU Accelerators

10 November 2025 - Training AI Weather Forecasting Models on AMD Instinct

18 September 2025 - Running SOTA AI-based Weather Forecasting models on AMD Instinct

Posts by Pedram Alizadeh

02 March 2025 - Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X

Posts by Pei Zhang

11 December 2025 - Accelerating llama.cpp on AMD Instinct MI300X

09 September 2025 - Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

Posts by Peng Sun

22 May 2026 - From Naive to Near-Peak: Building High-Performance GEMM Kernels with Gluon

07 May 2026 - vLLM-ATOM: Unlocking Native AMD Performance in the vLLM Ecosystem

20 April 2026 - Getting Started with FlyDSL Nightly Wheels on ROCm

24 March 2026 - Accelerating Kimi-K2.5 on AMD Instinct™ MI300X: Optimizing Fused MoE with FlyDSL

20 February 2026 - FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs

21 January 2026 - ROCm Becomes a First-Class Platform in the vLLM Ecosystem

02 January 2026 - Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models

24 November 2025 - The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

21 October 2025 - Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring

30 September 2025 - Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture

28 June 2025 - Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm

06 May 2025 - Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

21 March 2025 - Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

Posts by Peyman Razaghi

24 April 2026 - Primus Projection: Estimate Memory and Performance Before You Train

Posts by Phani Vaddadi

11 May 2026 - Accelerating ComfyUI Workflows on AMD Instinct™ MI355X GPUs with ROCm

07 April 2026 - Serving CTR Recommendation Models with Triton Inference Server using the ONNX Runtime Backend

06 April 2026 - FlashInfer on ROCm: High‑Throughput Prefill Attention via AITER

27 February 2026 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm 7 Support for Efficient ML Workflows

13 February 2026 - Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

12 February 2026 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

11 December 2025 - Accelerating llama.cpp on AMD Instinct MI300X

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

04 December 2025 - Modernizing Taichi Lang to LLVM 20 for MI355X GPU Acceleration

13 November 2025 - Accelerating Vector Search: hipVS and hipRAFT on AMD

07 October 2025 - Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs

03 October 2025 - Elevating 3D Scene Rendering with GSplat

02 October 2025 - From Ingestion to Inference: RAG Pipelines on AMD GPUs

01 October 2025 - Enabling FlashInfer on ROCm for Accelerated LLM Serving

30 September 2025 - Coding Agents on AMD GPUs: Fast LLM Pipelines for Developers

10 September 2025 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows

09 September 2025 - Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

20 August 2025 - DGL in the Real World: Running GNNs on Real Use Cases

31 July 2025 - Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware

31 July 2025 - Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

23 March 2025 - Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs

Posts by Phillip Dang

11 July 2024 - DBRX Instruct on AMD GPUs

28 June 2024 - Deep Learning Recommendation Models on AMD GPUs

29 May 2024 - Unveiling performance insights with PyTorch Profiler on an AMD GPU

26 April 2024 - Table Question-Answering with TaPas

26 April 2024 - Multimodal (Visual and Language) understanding with LLaVA-NeXT

16 April 2024 - Text Summarization with FLAN-T5

16 April 2024 - Program Synthesis with CodeGen

08 April 2024 - Small language models with Phi-2

04 April 2024 - Using the ChatGLM-6B bilingual language model with AMD GPUs

12 March 2024 - Building a decoder transformer model on AMD GPU(s)

11 March 2024 - Question-answering Chatbot with LangChain on an AMD GPU

08 March 2024 - Music Generation With MusicGen on an AMD GPU

08 February 2024 - Simplifying deep learning: A guide to PyTorch Lightning

Posts by Pier Luigi Dovesi

15 May 2026 - Semantic Fencing of Video Streams Using Embedding Splits from Vision Foundation Models

07 May 2026 - AMD-Powered 3D Gaussian Splatting for Autonomous Driving Scenes

24 April 2026 - Styled Text Image Generation with Eruku on AMD

16 December 2025 - 3D Scene Reconstruction from the Inside: Explore the Mathematics Behind gsplat

03 October 2025 - Elevating 3D Scene Rendering with GSplat

Posts by Pin Siang Tan

21 January 2026 - ROCm Becomes a First-Class Platform in the vLLM Ecosystem

02 January 2026 - Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models

24 November 2025 - The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism

28 June 2025 - Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm

Posts by Poovaiah Palangappa

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

Posts by Prakamya Mishra

24 February 2026 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

06 December 2025 - Building a State-of-the-Art 32 Billion Reasoning Model with Only Synthetic Data on AMD GPUs

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

Posts by Pratik Mishra

13 February 2026 - Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

13 November 2025 - Democratizing AI Compute with AMD Using SkyPilot

Posts by Pratik Prabhanjan Brahma

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

06 December 2025 - Building a State-of-the-Art 32 Billion Reasoning Model with Only Synthetic Data on AMD GPUs

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

Posts by Pruthvi Madugundu

21 October 2025 - Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring

Posts by Qiang Han

10 June 2026 - Dropless MoE Training in JAX with Primus-Turbo

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

Posts by Quentin Anthony

10 December 2024 - Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators

Posts by Rahul Biswas

08 June 2026 - ORBIT-2 based Weather and Climate Downscaling and Downscaled Global Forecasts on AMD Instinct

10 November 2025 - Training AI Weather Forecasting Models on AMD Instinct

18 September 2025 - Running SOTA AI-based Weather Forecasting models on AMD Instinct

Posts by Rajat Arora

15 September 2023 - Jacobi Solver with HIP and OpenMP offloading

11 May 2023 - Finite difference method - Laplacian part 3

09 March 2023 - AMD Instinct™ MI200 GPU memory space overview

04 January 2023 - Finite difference method - Laplacian part 2

14 November 2022 - Finite difference method - Laplacian part 1

Posts by Rajesh Poornachandran

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

Posts by Rajneesh Bhardwaj

09 February 2025 - Deep dive into the MI300 compute and memory partition modes

Posts by Rasmus Larsson

03 June 2026 - Adapting AIM LLMs For Specific Use Cases Through Fine-Tuning in AMD AI Workbench

02 April 2026 - Deploy and Customize AMD Solution Blueprints

31 March 2026 - Leveraging AMD AI Workbench and Autoscaling to Scale LLM Inference for Optimal Resource Utilization

24 February 2026 - Getting Started with AMD Resource Manager: Efficient Sharing of AMD Instinct™ GPUs for R&D Teams and AI Practitioners

19 December 2025 - Getting Started with AMD AI Workbench: Deploying and Managing AI Workloads

04 November 2025 - Retrieval Augmented Generation (RAG) with vLLM, LangChain and Chroma

Posts by Rathnakara Malatesha

25 February 2025 - Deploying Serverless AI Inference on AMD GPU Clusters

Posts by Ravi Dwivedula

12 November 2025 - Technical Dive into AMD MLPerf Training v5.1 Submission

12 November 2025 - Reproducing AMD MLPerf Training v5.1 Submission Result

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

Posts by Rebecca Lee

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

Posts by Rebecca Li

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

Posts by Reima Karhila

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Rene Van Oostrum

14 November 2022 - AMD matrix cores

Posts by Rishi Madduri

06 April 2026 - FlashInfer on ROCm: High‑Throughput Prefill Attention via AITER

01 October 2025 - Enabling FlashInfer on ROCm for Accelerated LLM Serving

23 March 2025 - Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs

Posts by Robert Talling

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Romil Bhardwaj

13 November 2025 - Democratizing AI Compute with AMD Using SkyPilot

Posts by Ronnie Chatterjee

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

Posts by Rony Leppanen

18 March 2026 - Multi-Node Distributed Inference for Diffusion Models with xDiT

Posts by Rui Sampaio

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

07 October 2025 - Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection

03 October 2025 - Optimizing Drug Discovery Tools on AMD MI300X Part 2: 3D Molecular Generation with SemlaFlow

19 September 2025 - Optimizing Drug Discovery Tools on AMD MI300X Part 1: Molecular Design with REINVENT

Posts by Ruibin Zhang

16 December 2025 - MoE Training Best Practices on AMD GPUs

Posts by Ruturaj Kiran Vaidya

24 February 2026 - JAX-AITER: Bringing AMD’s Optimized AI Kernels to JAX on ROCm™

Posts by Ryan Swann

27 March 2025 - Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding

09 February 2025 - Deep dive into the MI300 compute and memory partition modes

Posts by Saad Rahim

20 May 2026 - ROCm 7.13: Expanding Hardware, Tools, and Reach

05 November 2025 - Continuing the Momentum: Refining ROCm For The Next Wave Of AI and HPC

20 October 2025 - ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System

16 September 2025 - ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity

10 June 2025 - AMD ROCm: Powering the World’s Fastest Supercomputers

06 June 2025 - The ROCm Revisited Series

06 June 2025 - ROCm Revisited: Getting Started with HIP

06 June 2025 - ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem

28 May 2025 - HIP 7.0 Is Coming: What You Need to Know to Stay Ahead

22 May 2025 - ROCm Runfile Installer Is Here!

20 May 2025 - Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs

11 April 2025 - ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

Posts by Sander Bijl de Vroe

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Saptarshi Majumder

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Posts by Saroosh Shabbir

11 February 2026 - Solution Blueprints: Accelerating AI Deployment with AMD Enterprise AI

04 November 2025 - Retrieval Augmented Generation (RAG) with vLLM, LangChain and Chroma

Posts by Sarthak Arora

12 November 2025 - Technical Dive into AMD MLPerf Training v5.1 Submission

12 November 2025 - Reproducing AMD MLPerf Training v5.1 Submission Result

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

Posts by Sarthak Tandon

24 February 2026 - PyTorch Offline Tuning with TunableOp

Posts by Sarunas Kalade

09 February 2026 - Building Robotics Applications with Ryzen AI and ROS 2

14 July 2025 - Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot

Posts by Sathish Sanjeevi

12 November 2025 - Technical Dive into AMD MLPerf Training v5.1 Submission

12 November 2025 - Reproducing AMD MLPerf Training v5.1 Submission Result

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

Posts by Satya Jandhyala

27 February 2026 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm 7 Support for Efficient ML Workflows

13 February 2026 - Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

12 February 2026 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

Posts by Satya Ramji Ainapurapu

20 April 2026 - Getting Started with FlyDSL Nightly Wheels on ROCm

Posts by Scott Todd

20 October 2025 - ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System

Posts by Sean Miller

18 July 2023 - Finite difference method - Laplacian part 4

11 May 2023 - Finite difference method - Laplacian part 3

09 March 2023 - AMD Instinct™ MI200 GPU memory space overview

04 January 2023 - Finite difference method - Laplacian part 2

14 November 2022 - Finite difference method - Laplacian part 1

Posts by Sean Song

22 January 2026 - LLM Inference Optimization Using AMD GPU Partitioning

09 June 2025 - LLM Quantization with Quark on AMD GPUs: Accuracy and Performance Evaluation

09 February 2025 - PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm

24 January 2025 - Vision Mamba on AMD GPU with ROCm

01 November 2024 - Distributed Data Parallel Training on AMD GPU with ROCm

23 October 2024 - Inference with Llama 3.2 Vision LLMs on AMD GPUs Using ROCm

11 July 2024 - Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

28 June 2024 - Mamba on AMD GPUs with ROCm

04 June 2024 - Segment Anything with AMD GPUs

24 April 2024 - Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

16 April 2024 - Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama Model on a single AMD GPU

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU

05 February 2024 - Using LoRA for efficient fine-tuning: Fundamental principles

01 February 2024 - Fine-tune Llama model with LoRA: Customizing a large language model for question-answering

01 February 2024 - Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

Posts by Sebastian Andersson

17 November 2025 - AMD Enterprise AI Suite: Open Infrastructure for Production AI

Posts by Sebastian Remander

24 March 2026 - GROMACS on AMD Instinct GPUs: A Complete Build Guide

13 March 2026 - GROMACS Performance on AMD Instinct MI355X

12 January 2026 - Installing AMD HIP-Enabled GROMACS on HPC Systems: A LUMI Supercomputer Case Study

Posts by Seungrok Jung

07 July 2025 - vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance

01 May 2025 - Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools

28 April 2025 - Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart

28 April 2025 - Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs

06 April 2025 - Power Up Llama 4 with AMD Instinct: A Developer’s Day 0 Quickstart

21 March 2025 - Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

15 March 2024 - Large language model inference optimizations on AMD GPUs

Posts by Shaghayegh Roohi

07 May 2026 - AMD-Powered 3D Gaussian Splatting for Autonomous Driving Scenes

24 April 2026 - Styled Text Image Generation with Eruku on AMD

16 December 2025 - 3D Scene Reconstruction from the Inside: Explore the Mathematics Behind gsplat

28 November 2025 - VLM Fine-Tuning for Robotics on AMD Enterprise AI Suite

03 October 2025 - Elevating 3D Scene Rendering with GSplat

Posts by Shashank Kashyap

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Shekhar Pandey

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

01 May 2025 - Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools

28 April 2025 - Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs

21 March 2025 - AITER: AI Tensor Engine For ROCm

14 March 2025 - Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide

Posts by Shenrun Zhang

27 March 2025 - Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding

Posts by Shijie Feng

20 April 2026 - Getting Started with FlyDSL Nightly Wheels on ROCm

20 February 2026 - FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs

Posts by Shizhe Ding

22 January 2026 - Nitro-AR: A Compact AR Transformer for High-Quality Image Generation

Posts by Shizhu He

02 January 2026 - SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

Posts by Shrey Ajmera

13 January 2026 - Reimagining GPU Allocation in Kubernetes: Introducing the AMD GPU DRA Driver

08 January 2026 - Introducing the AMD Network Operator v1.0.0: Simplifying High-Performance Networking for AMD Platforms

Posts by Shubin Zhao

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

Posts by Silvia Cascianelli

24 April 2026 - Styled Text Image Generation with Eruku on AMD

Posts by Simon Mo

21 January 2026 - ROCm Becomes a First-Class Platform in the vLLM Ecosystem

Posts by Sonali Singh

27 March 2025 - Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding

09 February 2025 - Deep dive into the MI300 compute and memory partition modes

Posts by Sonya Yang

09 February 2026 - Digital Twins on AMD: Building Robotic Simulations Using Edge AI PCs

Posts by Sopiko Kurdadze

03 February 2026 - Foundations of Molecular Generation with GP-MoLFormer on AMD Instinct MI300X Accelerators

21 November 2025 - Accelerating AI-Driven Crystalline Materials Design with MatterGen on AMD Instinct MI300X

10 November 2025 - Training AI Weather Forecasting Models on AMD Instinct

24 September 2025 - A Simple Design for Serving Video Generation Models with Distributed Inference

19 August 2025 - Accelerating FastVideo on AMD GPUs with TeaCache

Posts by Soumitra Chatterjee

07 October 2025 - Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs

18 July 2025 - Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs

18 July 2025 - Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

Posts by Spandan More

27 April 2026 - TraceLens: Democratizing AI Performance Analysis

Posts by Spandan Tiwari

11 June 2026 - Productionizing TurboQuant on AMD GPUs for KV-Cache-Bound LLM Inference

11 June 2026 - Low Kruskal-Rank Adaptation

20 May 2026 - QuickReduce FP4 Quantization and Benchmarking on MI355

25 March 2026 - Programming Tensor Descriptors in Composable Kernel (CK)

24 March 2026 - Engineering Qwen-VL for Production: Vision Module Architecture and Optimization Practices

19 March 2026 - hipBLASLt Online GEMM Tuning

17 February 2026 - Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression

05 November 2025 - Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script

29 October 2025 - High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

26 August 2025 - QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

Posts by Srinivasan Subramanian

21 October 2025 - Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring

Posts by Sriranjani Ramasubramanian

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

Posts by Stanislau Fink

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Steve Reinhardt

02 March 2026 - Streamlining Recommendation Model Training on AMD Instinct™ GPUs

07 January 2026 - Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation

Posts by Steven K. Reinhardt

27 April 2026 - TraceLens: Democratizing AI Performance Analysis

Posts by Stig-Arne Gronroos

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Su Ann Chong

12 November 2025 - Technical Dive into AMD MLPerf Training v5.1 Submission

12 November 2025 - Reproducing AMD MLPerf Training v5.1 Submission Result

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

Posts by Subhajit Dutta Chowdhury

29 May 2026 - Enabling Speculative Speculative Decoding on MI300X

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

Posts by Sudhanshu Ranjan

24 February 2026 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

Posts by Sujin Philip

13 November 2025 - Accelerating Vector Search: hipVS and hipRAFT on AMD

Posts by Sukriti Choudhary

13 November 2025 - Accelerating Vector Search: hipVS and hipRAFT on AMD

Posts by Sundara Murthy Gurunathan

08 January 2026 - Introducing the AMD Network Operator v1.0.0: Simplifying High-Performance Networking for AMD Platforms

Posts by Suyash Tandon

10 April 2026 - Introduction to profiling tools for AMD hardware

09 February 2025 - MI300A - Exploring the APU advantage

26 April 2024 - Application portability with HIP

Posts by Takashi Isobe

08 January 2026 - Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms

03 August 2025 - AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation

Posts by Ted Themistokleous

22 May 2026 - From Build to Benchmark: ONNX Model Serving with Triton Inference Server on AMD GPUs

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

Posts by Teemu Karkkainen

28 November 2025 - VLM Fine-Tuning for Robotics on AMD Enterprise AI Suite

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Teemu Virolainen

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Tero Kemppi

18 March 2026 - Multi-Node Distributed Inference for Diffusion Models with xDiT

Posts by Tharun Adithya Srikrishnan

02 March 2026 - Streamlining Recommendation Model Training on AMD Instinct™ GPUs

Posts by Theresa Shan

23 March 2026 - Edge-to-Cloud Robotics with AMD ROCm: From Data Collection to Real-Time Inference

Posts by Thiago Crepaldi

11 June 2026 - Productionizing TurboQuant on AMD GPUs for KV-Cache-Bound LLM Inference

Posts by Thomas Bergstrom

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Thomas Gibson

01 June 2026 - Performance Profiling on AMD GPUs - Part 4: Fortran OpenMP Offload Edition

10 April 2026 - Introduction to profiling tools for AMD hardware

23 October 2025 - Performance Profiling on AMD GPUs - Part 3: Advanced Usage

13 August 2025 - Performance Profiling on AMD GPUs – Part 2: Basic Usage

26 June 2025 - Performance Profiling on AMD GPUs – Part 1: Foundations

29 July 2024 - Graph analytics on AMD GPUs using Gunrock

18 July 2023 - Finite difference method - Laplacian part 4

11 May 2023 - Finite difference method - Laplacian part 3

04 January 2023 - Finite difference method - Laplacian part 2

14 November 2022 - Finite difference method - Laplacian part 1

Posts by Tiffany Mintz

04 December 2025 - Modernizing Taichi Lang to LLVM 20 for MI355X GPU Acceleration

31 July 2025 - Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

Posts by Tomas Saaristola

03 June 2026 - Adapting AIM LLMs For Specific Use Cases Through Fine-Tuning in AMD AI Workbench

Posts by Tong Shen

22 January 2026 - Nitro-AR: A Compact AR Transformer for High-Quality Image Generation

24 October 2025 - Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation

09 July 2025 - Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day

Posts by Treemann Zheng

08 December 2025 - Accelerating Autonomous Driving Model Training on AMD ROCm™ Software

Posts by Tres Popp

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

Posts by Tun Jian Tan

21 January 2026 - ROCm Becomes a First-Class Platform in the vLLM Ecosystem

02 January 2026 - Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models

24 November 2025 - The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism

28 June 2025 - Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm

Posts by Tuukka Sarvi

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Uma Kannikanti

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

Posts by Umang Pandey

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

Posts by Vara Lakshmi Bayanagari

28 January 2025 - Distributed fine-tuning of MPT-30B using Composer on AMD GPUs

03 December 2024 - Transformer based Encoder-Decoder models for image-captioning on AMD GPUs

13 November 2024 - Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs

15 October 2024 - Speed Up Text Generation with Speculative Sampling on AMD GPUs

03 September 2024 - Image Classification with BEiT, MobileNet, and EfficientNet using ROCm on AMD GPUs

23 May 2024 - Panoptic segmentation and instance segmentation with Detectron2 on AMD GPUs

30 April 2024 - Training a Neural Collaborative Filtering (NCF) Recommender on an AMD GPU

16 April 2024 - PyTorch C++ Extension on AMD GPU

04 April 2024 - Total body segmentation using MONAI Deploy on an AMD GPU

07 February 2024 - Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU

29 January 2024 - Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU

26 January 2024 - Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

Posts by Vasumathi Neralla

07 October 2025 - Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection

03 October 2025 - Optimizing Drug Discovery Tools on AMD MI300X Part 2: 3D Molecular Generation with SemlaFlow

Posts by Vicky Tsang

11 May 2026 - Accelerating ComfyUI Workflows on AMD Instinct™ MI355X GPUs with ROCm

27 February 2026 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm 7 Support for Efficient ML Workflows

13 February 2026 - Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

12 February 2026 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

10 September 2025 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

01 April 2024 - Scale AI applications with Ray

Posts by Victor Robles

13 March 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3

14 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2

07 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1

Posts by Vidushi Goyal

24 April 2026 - Primus Projection: Estimate Memory and Performance Before You Train

15 January 2026 - Deep Dive into Primus: High-Performance Training for Large Language Models

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

22 August 2025 - Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

Posts by Vikas C Sajjan

16 December 2025 - 3D Scene Reconstruction from the Inside: Explore the Mathematics Behind gsplat

07 October 2025 - Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs

03 October 2025 - Elevating 3D Scene Rendering with GSplat

18 July 2025 - Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs

18 July 2025 - Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

Posts by Vikram Appia

29 May 2026 - Enabling Speculative Speculative Decoding on MI300X

05 May 2026 - Accelerating Mixture-of-Experts Execution with FarSkip-Collective Models

17 September 2025 - AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models

28 April 2025 - Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs

Posts by Vin Huang

17 February 2026 - Unlocking Sparse Acceleration on AMD GPUs with hipSPARSELt

Posts by Vinay Joshi

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Posts by Vinayak Gokhale

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

Posts by Vish Vadlamani

11 May 2026 - Accelerating ComfyUI Workflows on AMD Instinct™ MI355X GPUs with ROCm

07 April 2026 - Serving CTR Recommendation Models with Triton Inference Server using the ONNX Runtime Backend

06 April 2026 - FlashInfer on ROCm: High‑Throughput Prefill Attention via AITER

27 February 2026 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm 7 Support for Efficient ML Workflows

13 February 2026 - Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

12 February 2026 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

11 December 2025 - Accelerating llama.cpp on AMD Instinct MI300X

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

04 December 2025 - Modernizing Taichi Lang to LLVM 20 for MI355X GPU Acceleration

13 November 2025 - Accelerating Vector Search: hipVS and hipRAFT on AMD

07 October 2025 - Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs

03 October 2025 - Elevating 3D Scene Rendering with GSplat

02 October 2025 - From Ingestion to Inference: RAG Pipelines on AMD GPUs

01 October 2025 - Enabling FlashInfer on ROCm for Accelerated LLM Serving

30 September 2025 - Coding Agents on AMD GPUs: Fast LLM Pipelines for Developers

10 September 2025 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows

09 September 2025 - Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

20 August 2025 - DGL in the Real World: Running GNNs on Real Use Cases

31 July 2025 - Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware

31 July 2025 - Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

23 March 2025 - Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

Posts by Vittorio Pippi

24 April 2026 - Styled Text Image Generation with Eruku on AMD

Posts by Vivian Cheng

09 February 2026 - Building Robotics Applications with Ryzen AI and ROS 2

23 October 2025 - STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm

Posts by Warren Eng

07 August 2025 - Running ComfyUI in Windows with ROCm on WSL

Posts by Wei Cai

15 January 2026 - Deep Dive into Primus: High-Performance Training for Large Language Models

16 October 2025 - Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs

15 May 2025 - Step-Video-T2V Inference with xDiT on AMD Instinct MI300X GPUs

Posts by Wei Luo

20 May 2026 - QuickReduce FP4 Quantization and Benchmarking on MI355

25 March 2026 - Programming Tensor Descriptors in Composable Kernel (CK)

24 March 2026 - Engineering Qwen-VL for Production: Vision Module Architecture and Optimization Practices

19 March 2026 - hipBLASLt Online GEMM Tuning

17 February 2026 - Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression

05 November 2025 - Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script

29 October 2025 - High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

26 August 2025 - QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

Posts by Wei-Ting Liao

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

Posts by Wen Xie

10 June 2026 - Dropless MoE Training in JAX with Primus-Turbo

24 April 2026 - Primus Projection: Estimate Memory and Performance Before You Train

23 February 2026 - Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

08 February 2026 - Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs

15 January 2026 - Deep Dive into Primus: High-Performance Training for Large Language Models

16 December 2025 - MoE Training Best Practices on AMD GPUs

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

19 September 2025 - An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs

22 August 2025 - Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

Posts by Wensong Chan

05 February 2026 - Micro-World: First AMD Open-Source World Models for Interactive Video Generation

Posts by Wickey Wang

08 January 2026 - Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms

Posts by William Anzen

31 March 2026 - Leveraging AMD AI Workbench and Autoscaling to Scale LLM Inference for Optimal Resource Utilization

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

Posts by Xavier Aguilar Fruto

08 December 2025 - Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

Posts by Xi Zhao

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

Posts by Xiaobo Chen

10 June 2026 - Dropless MoE Training in JAX with Primus-Turbo

15 January 2026 - Deep Dive into Primus: High-Performance Training for Large Language Models

16 December 2025 - MoE Training Best Practices on AMD GPUs

19 September 2025 - An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs

22 August 2025 - Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

Posts by Xiaodong Yu

24 February 2026 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

Posts by Xiaofeng Zheng

08 February 2026 - Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs

Posts by Xiaoming Peng

23 February 2026 - Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

16 December 2025 - MoE Training Best Practices on AMD GPUs

22 August 2025 - Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

Posts by Ximeng Sun

24 February 2026 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

Posts by Xinjun Niu

20 May 2026 - QuickReduce FP4 Quantization and Benchmarking on MI355

25 March 2026 - Programming Tensor Descriptors in Composable Kernel (CK)

24 March 2026 - Engineering Qwen-VL for Production: Vision Module Architecture and Optimization Practices

19 March 2026 - hipBLASLt Online GEMM Tuning

17 February 2026 - Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression

05 November 2025 - Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script

29 October 2025 - High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

26 August 2025 - QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

Posts by Xuanwu Yin

11 June 2026 - Low Kruskal-Rank Adaptation

29 May 2026 - Enabling Speculative Speculative Decoding on MI300X

20 April 2026 - FLy: A New Paradigm for Speculative Decoding — Accepting Semantically Correct Drafts Beyond Exact Match

12 January 2026 - Athena-PRM: Enhancing Multimodal Reasoning with Data-Efficient Process Reward Models

07 January 2026 - Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation

02 January 2026 - SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

03 December 2025 - Týr-the-Pruner: Search-based Global Structural Pruning for LLMs

14 October 2025 - Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

22 August 2025 - Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning

Posts by Xun Wang

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

Posts by Yamini Kamisetty

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

Posts by Yamini Preethi Kamisetty

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

Posts by Yan Sun

13 January 2026 - Reimagining GPU Allocation in Kubernetes: Introducing the AMD GPU DRA Driver

Posts by YangWen Huang

06 April 2026 - Customizing Kernels with hipBLASLt TensileLite GEMM Tuning - Advanced User Guide

09 October 2025 - GEMM Tuning within hipBLASLt– Part 2

05 September 2025 - GEMM Tuning within hipBLASLt - Part 1

Posts by Yanyuan Qin

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

16 December 2025 - MoE Training Best Practices on AMD GPUs

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

Posts by Yao Fehlis

11 September 2023 - Creating a PyTorch/TensorFlow code environment on AMD GPUs

Posts by Yao Fu

24 April 2026 - Primus Projection: Estimate Memory and Performance Before You Train

02 March 2026 - Streamlining Recommendation Model Training on AMD Instinct™ GPUs

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

23 February 2026 - Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

08 February 2026 - Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs

15 January 2026 - Deep Dive into Primus: High-Performance Training for Large Language Models

16 December 2025 - MoE Training Best Practices on AMD GPUs

02 December 2025 - Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance

04 November 2025 - Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training

19 September 2025 - An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs

22 August 2025 - Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

13 March 2025 - Optimized ROCm Docker for Distributed AI Training

Posts by Yao Liu

11 May 2026 - Accelerating ComfyUI Workflows on AMD Instinct™ MI355X GPUs with ROCm

07 April 2026 - Serving CTR Recommendation Models with Triton Inference Server using the ONNX Runtime Backend

06 April 2026 - FlashInfer on ROCm: High‑Throughput Prefill Attention via AITER

27 February 2026 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm 7 Support for Efficient ML Workflows

13 February 2026 - Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

12 February 2026 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

11 December 2025 - Accelerating llama.cpp on AMD Instinct MI300X

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

04 December 2025 - Modernizing Taichi Lang to LLVM 20 for MI355X GPU Acceleration

02 October 2025 - From Ingestion to Inference: RAG Pipelines on AMD GPUs

01 October 2025 - Enabling FlashInfer on ROCm for Accelerated LLM Serving

30 September 2025 - Coding Agents on AMD GPUs: Fast LLM Pipelines for Developers

10 September 2025 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows

09 September 2025 - Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

20 August 2025 - DGL in the Real World: Running GNNs on Real Use Cases

31 July 2025 - Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware

31 July 2025 - Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

23 March 2025 - Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

Posts by Yaoming Mu

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

Posts by Yayuan Wang

09 March 2026 - Getting Started with ComfyUI on AMD Radeon™ RX 9000 Series GPUs

Posts by Yazhini Rajesh

08 January 2026 - Using Gradient Boosting Libraries on MI300X for Financial Risk Prediction

Posts by Ye Hur Cheong

02 January 2026 - Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models

24 November 2025 - The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism

Posts by Yi Huang

10 June 2026 - Dropless MoE Training in JAX with Primus-Turbo

09 March 2026 - Agentic Diagnosis for LLM Training at Scale

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

Posts by Yineng Zhang

13 November 2024 - SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs

Posts by Yixing Xu

11 June 2026 - Low Kruskal-Rank Adaptation

29 May 2026 - Enabling Speculative Speculative Decoding on MI300X

20 April 2026 - FLy: A New Paradigm for Speculative Decoding — Accepting Semantically Correct Drafts Beyond Exact Match

02 January 2026 - SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

03 December 2025 - Týr-the-Pruner: Search-based Global Structural Pruning for LLMs

14 October 2025 - Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

Posts by Yonatan Dukler

05 May 2026 - Accelerating Mixture-of-Experts Execution with FarSkip-Collective Models

Posts by Yosi Hatekar

23 October 2025 - STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm

Posts by Yu Geng

05 February 2026 - Micro-World: First AMD Open-Source World Models for Interactive Video Generation

Posts by Yu Wang

17 November 2025 - AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs

17 November 2025 - AMD Enterprise AI Suite: Open Infrastructure for Production AI

12 March 2025 - AMD Advances Enterprise AI Through OPEA Integration

Posts by Yu Zhou

19 March 2026 - hipBLASLt Online GEMM Tuning

Posts by Yuankai Chen

24 April 2026 - Primus Projection: Estimate Memory and Performance Before You Train

23 February 2026 - Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

16 December 2025 - MoE Training Best Practices on AMD GPUs

Posts by Yuchen Lin

06 April 2026 - Customizing Kernels with hipBLASLt TensileLite GEMM Tuning - Advanced User Guide

30 January 2026 - Debugging NaN Results in CK Tile GEMM: A rocgdb Detective Story

25 July 2025 - Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

Posts by Yue Liu

02 March 2026 - Streamlining Recommendation Model Training on AMD Instinct™ GPUs

Posts by Yuhang Song

09 February 2026 - Digital Twins on AMD: Building Robotic Simulations Using Edge AI PCs

Posts by Yusheng Su

24 February 2026 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

Posts by Yutong Wu

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

Posts by Yuvarani Shankar

08 January 2026 - Introducing the AMD Network Operator v1.0.0: Simplifying High-Performance Networking for AMD Platforms

Posts by Yuzhen Zhou

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

Posts by Yuzhou Lu

09 February 2026 - Digital Twins on AMD: Building Robotic Simulations Using Edge AI PCs

Posts by Ze Wang

24 February 2026 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

Posts by Zejun Chen

07 May 2026 - vLLM-ATOM: Unlocking Native AMD Performance in the vLLM Ecosystem

Posts by Zeping Li

03 December 2025 - Týr-the-Pruner: Search-based Global Structural Pruning for LLMs

Posts by Zhanghao Wu

13 November 2025 - Democratizing AI Compute with AMD Using SkyPilot

Posts by Zhao An

24 March 2026 - Engineering Qwen-VL for Production: Vision Module Architecture and Optimization Practices

Posts by Zhao Lin

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

Posts by Zhaodong Bing

01 June 2026 - Out-of-the-Box ROLL Support on AMD GPUs: Accelerating Reinforcement Learning at Scale

08 December 2025 - Accelerating Autonomous Driving Model Training on AMD ROCm™ Software

Posts by Zhaofeng Zhang

17 February 2026 - Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression

29 October 2025 - High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

Posts by Zhe Li

07 January 2026 - Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

Posts by Zhen Huang

10 June 2026 - Dropless MoE Training in JAX with Primus-Turbo

16 December 2025 - MoE Training Best Practices on AMD GPUs

Posts by Zhenhua Liu

12 January 2026 - Athena-PRM: Enhancing Multimodal Reasoning with Data-Efficient Process Reward Models

22 August 2025 - Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning

Posts by Zhenyu Gu

10 June 2026 - Dropless MoE Training in JAX with Primus-Turbo

29 May 2026 - Enabling Speculative Speculative Decoding on MI300X

24 April 2026 - Primus Projection: Estimate Memory and Performance Before You Train

09 March 2026 - Agentic Diagnosis for LLM Training at Scale

02 March 2026 - MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

23 February 2026 - Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

08 February 2026 - Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs

16 December 2025 - MoE Training Best Practices on AMD GPUs

04 November 2025 - Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training

19 September 2025 - An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

Posts by Zhiquan Chen

24 March 2026 - Engineering Qwen-VL for Production: Vision Module Architecture and Optimization Practices

Posts by Zhou Yu

19 March 2026 - hipBLASLt Online GEMM Tuning

Posts by Zhu Shan

18 June 2025 - Fine-Tuning LLMs with GRPO on AMD MI300X: Scalable RLHF with Hugging Face TRL and ROCm

Posts by Zicheng Liu

24 February 2026 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

06 December 2025 - Building a State-of-the-Art 32 Billion Reasoning Model with Only Synthetic Data on AMD GPUs

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

Posts by Ziqiong Liu

02 March 2026 - Streamlining Recommendation Model Training on AMD Instinct™ GPUs

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Posts by Zongheng Yang

13 November 2025 - Democratizing AI Compute with AMD Using SkyPilot

Posts by and AMD Shark Team

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0