Posts by Andy Luo

10 March 2026 - FP8 GEMM Optimization on AMD CDNA™4 Architecture

21 January 2026 - ROCm Becomes a First-Class Platform in the vLLM Ecosystem

02 January 2026 - Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models

16 December 2025 - MoE Training Best Practices on AMD GPUs

24 November 2025 - The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism

12 November 2025 - Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

04 November 2025 - Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training

30 September 2025 - Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture

19 September 2025 - An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs

11 September 2025 - Efficient LLM Serving with MTP: DeepSeek V3 and SGLang on AMD Instinct GPUs

28 August 2025 - Unleashing AMD Instinct™ MI300X GPUs for LLM Serving: Disaggregating Prefill & Decode with SGLang

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

07 July 2025 - vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance

20 May 2025 - AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving

01 May 2025 - Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools

28 April 2025 - Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart

28 April 2025 - Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs

06 April 2025 - Power Up Llama 4 with AMD Instinct: A Developer’s Day 0 Quickstart

21 March 2025 - Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

21 February 2025 - Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU

29 January 2025 - Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs

Posts by Corbin Robeck

13 May 2024 - Reading AMD GPU ISA

Posts by Damon McDougall

08 June 2023 - GPU-aware MPI with ROCm

14 November 2022 - AMD matrix cores

Posts by Daniel Velicka

14 November 2022 - AMD matrix cores

Posts by David Doscher

26 January 2023 - AMD ROCm™ installation

Posts by Dong Li

05 February 2026 - Micro-World: First AMD Open-Source World Models for Interactive Video Generation

22 January 2026 - Nitro-AR: A Compact AR Transformer for High-Quality Image Generation

12 January 2026 - Athena-PRM: Enhancing Multimodal Reasoning with Data-Efficient Process Reward Models

08 January 2026 - Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms

07 January 2026 - Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation

02 January 2026 - SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

08 December 2025 - Accelerating Autonomous Driving Model Training on AMD ROCm™ Software

03 December 2025 - Týr-the-Pruner: Search-based Global Structural Pruning for LLMs

24 October 2025 - Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation

14 October 2025 - Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

22 August 2025 - Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning

03 August 2025 - AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Posts by Eliot Li

01 April 2026 - Reproducing the AMD MLPerf Inference v6.0 Submission Result

01 April 2026 - AMD Instinct™ GPUs MLPerf Inference v6.0 Submission

13 February 2026 - Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

11 December 2025 - Accelerating llama.cpp on AMD Instinct MI300X

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

14 November 2025 - Plug-and-Play CuPy on ROCm: Data Analytics Acceleration Made Simple

13 November 2025 - Accelerating Vector Search: hipVS and hipRAFT on AMD

12 November 2025 - Technical Dive into AMD MLPerf Training v5.1 Submission

12 November 2025 - Reproducing AMD MLPerf Training v5.1 Submission Result

02 October 2025 - From Ingestion to Inference: RAG Pipelines on AMD GPUs

09 September 2025 - Technical Dive into AMD’s MLPerf Inference v5.1 Submission

09 September 2025 - Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

09 September 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

09 September 2025 - Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

30 May 2025 - Scale LLM Inference with Multi-Node Infrastructure

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

28 August 2024 - Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission

21 August 2024 - Performing natural language processing tasks with LLMs on ROCm running on AMD GPUs

09 August 2024 - Inferencing with Grok-1 on AMD GPUs

04 April 2024 - Image classification using Vision Transformer with AMD GPUs

01 April 2024 - Scale AI applications with Ray

Posts by Emad Barsoum

20 February 2026 - FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs

05 February 2026 - Micro-World: First AMD Open-Source World Models for Interactive Video Generation

22 January 2026 - Nitro-AR: A Compact AR Transformer for High-Quality Image Generation

12 January 2026 - Athena-PRM: Enhancing Multimodal Reasoning with Data-Efficient Process Reward Models

08 January 2026 - Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms

07 January 2026 - Breaking the Accuracy-Speed Barrier: How MXFP4/6 Quantization Revolutionizes Image and Video Generation

02 January 2026 - SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

23 December 2025 - GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025 - GEAK HIP: Expanding GEAK for HIP Code Optimization

08 December 2025 - Accelerating Autonomous Driving Model Training on AMD ROCm™ Software

06 December 2025 - Building a State-of-the-Art 32 Billion Reasoning Model with Only Synthetic Data on AMD GPUs

03 December 2025 - Týr-the-Pruner: Search-based Global Structural Pruning for LLMs

21 November 2025 - LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

24 October 2025 - Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation

14 October 2025 - Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More

25 September 2025 - Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs

17 September 2025 - AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models

22 August 2025 - Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning

09 August 2025 - Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

03 August 2025 - AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation

01 August 2025 - GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

28 April 2025 - Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

31 January 2025 - Enhancing AI Training with AMD ROCm Software

Posts by George Wang

09 March 2026 - Getting Started with ComfyUI on AMD Radeon™ RX 9000 Series GPUs

20 January 2026 - Quickly Developing Powerful Flash Attention Using TileLang on AMD Instinct MI300X GPU

15 January 2026 - Deep Dive into Primus: High-Performance Training for Large Language Models

16 October 2025 - Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs

06 October 2025 - Optimizing FP4 Mixed-Precision Inference with Petit on AMD Instinct MI250 and MI300 GPUs: A Developer’s Perspective

04 September 2025 - Step-3 Deployment Simplified: A Day 0 Developer’s Guide on AMD Instinct™ GPUs

25 August 2025 - AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs

05 August 2025 - Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

25 July 2025 - Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

17 July 2025 - Vibe Coding Pac-Man Inspired Game with DeepSeek-R1 and AMD Instinct MI300X

18 June 2025 - Fine-Tuning LLMs with GRPO on AMD MI300X: Scalable RLHF with Hugging Face TRL and ROCm

21 May 2025 - From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile

16 May 2025 - Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang

15 May 2025 - Step-Video-T2V Inference with xDiT on AMD Instinct MI300X GPUs

06 May 2025 - Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

15 April 2025 - Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs

10 April 2025 - Unlock Peak Performance on AMD GPUs with Triton Kernel Optimizations

06 February 2025 - GEMM Kernel Optimization For AMD GPUs

Posts by Jassani Adeem

28 June 2024 - Mamba on AMD GPUs with ROCm

Posts by Jin Zhou

24 February 2026 - PyTorch Offline Tuning with TunableOp

Posts by Lei Shao

09 August 2024 - Inferencing with Grok-1 on AMD GPUs

Posts by Mou Li

16 December 2025 - MoE Training Best Practices on AMD GPUs

Posts by Nicholas Malaya

14 November 2022 - AMD matrix cores

Posts by Noel Chalmers

08 June 2023 - GPU-aware MPI with ROCm

14 November 2022 - AMD matrix cores

Posts by Phani Vaddadi

27 February 2026 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm 7 Support for Efficient ML Workflows

13 February 2026 - Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

12 February 2026 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

11 December 2025 - Accelerating llama.cpp on AMD Instinct MI300X

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

04 December 2025 - Modernizing Taichi Lang to LLVM 20 for MI355X GPU Acceleration

13 November 2025 - Accelerating Vector Search: hipVS and hipRAFT on AMD

07 October 2025 - Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs

03 October 2025 - Elevating 3D Scene Rendering with GSplat

02 October 2025 - From Ingestion to Inference: RAG Pipelines on AMD GPUs

01 October 2025 - Enabling FlashInfer on ROCm for Accelerated LLM Serving

30 September 2025 - Coding Agents on AMD GPUs: Fast LLM Pipelines for Developers

10 September 2025 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows

09 September 2025 - Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

20 August 2025 - DGL in the Real World: Running GNNs on Real Use Cases

31 July 2025 - Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware

31 July 2025 - Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

23 March 2025 - Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs

Posts by Rene Van Oostrum

14 November 2022 - AMD matrix cores

Posts by Vish Vadlamani

27 February 2026 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm 7 Support for Efficient ML Workflows

13 February 2026 - Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

12 February 2026 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

11 December 2025 - Accelerating llama.cpp on AMD Instinct MI300X

05 December 2025 - DGL in Depth: SE(3)-Transformer on ROCm 7

04 December 2025 - Modernizing Taichi Lang to LLVM 20 for MI355X GPU Acceleration

13 November 2025 - Accelerating Vector Search: hipVS and hipRAFT on AMD

07 October 2025 - Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs

03 October 2025 - Elevating 3D Scene Rendering with GSplat

02 October 2025 - From Ingestion to Inference: RAG Pipelines on AMD GPUs

01 October 2025 - Enabling FlashInfer on ROCm for Accelerated LLM Serving

30 September 2025 - Coding Agents on AMD GPUs: Fast LLM Pipelines for Developers

10 September 2025 - Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows

09 September 2025 - Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

20 August 2025 - DGL in the Real World: Running GNNs on Real Use Cases

31 July 2025 - Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware

31 July 2025 - Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

23 March 2025 - Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

Posts by Yu Zhou

19 March 2026 - hipBLASLt Online GEMM Tuning

Posts by Zhou Yu

19 March 2026 - hipBLASLt Online GEMM Tuning