| Eliot Li | 17 | From Ingestion to Inference: RAG Pipelines on AMD GPUs October 02, 2025
 | Scale AI applications with Ray April 01, 2024
 | 
| Fabricio Flores | 16 | From Ingestion to Inference: RAG Pipelines on AMD GPUs October 02, 2025
 | Building semantic search with SentenceTransformers on AMD April 04, 2024
 | 
| Clint Greene | 16 | Enabling FlashInfer on ROCm for Accelerated LLM Serving October 01, 2025
 | Accelerating XGBoost with Dask using multiple AMD GPUs January 26, 2024
 | 
| George Wang | 15 | Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs October 16, 2025
 | GEMM Kernel Optimization For AMD GPUs February 06, 2025
 | 
| Sean Song | 15 | LLM Quantization with Quark on AMD GPUs: Accuracy and Performance Evaluation June 09, 2025
 | Fine-tune Llama model with LoRA: Customizing a large language model for question-answering February 01, 2024
 | 
| Douglas Jia | 14 | Multinode Fine-Tuning of Stable Diffusion XL on AMD GPUs with Hugging Face Accelerate and OCI's Kubernetes Engine (OKE) October 15, 2024
 | Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs January 24, 2024
 | 
| Emad Barsoum | 13 | Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation October 24, 2025
 | Enhancing AI Training with AMD ROCm Software January 31, 2025
 | 
| Andy Luo | 13 | Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture September 30, 2025
 | Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs January 29, 2025
 | 
| Vish Vadlamani | 13 | Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs October 07, 2025
 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025
 | 
| Phillip Dang | 13 | DBRX Instruct on AMD GPUs July 11, 2024
 | Simplifying deep learning: A guide to PyTorch Lightning February 08, 2024
 | 
| Phani Vaddadi | 12 | Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs October 07, 2025
 | Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs March 23, 2025
 | 
| Anshul Gupta | 11 | Efficient LLM Serving with MTP: DeepSeek V3 and SGLang on AMD Instinct GPUs September 11, 2025
 | GEMM Kernel Optimization For AMD GPUs February 06, 2025
 | 
| Vara Lakshmi Bayanagari | 11 | Distributed fine-tuning of MPT-30B using Composer on AMD GPUs January 28, 2025
 | Pre-training BERT using Hugging Face & PyTorch on an AMD GPU January 26, 2024
 | 
| Yao Liu | 11 | From Ingestion to Inference: RAG Pipelines on AMD GPUs October 02, 2025
 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025
 | 
| Saad Rahim | 11 | ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System October 20, 2025
 | ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver April 11, 2025
 | 
| Gina Sitaraman | 11 | Performance Profiling on AMD GPUs - Part 3: Advanced Usage October 23, 2025
 | AMD matrix cores November 14, 2022
 | 
| Justin Chang | 10 | MI300A - Exploring the APU advantage February 09, 2025
 | Finite difference method - Laplacian part 1 November 14, 2022
 | 
| Meena Arunachalam | 9 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission August 28, 2024
 | 
| Miro Hodak | 9 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission August 28, 2024
 | 
| Thomas Gibson | 9 | Performance Profiling on AMD GPUs - Part 3: Advanced Usage October 23, 2025
 | Finite difference method - Laplacian part 1 November 14, 2022
 | 
| Zicheng Liu | 8 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | 
| Liz Li | 8 | An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs September 19, 2025
 | AITER: AI Tensor Engine For ROCm March 21, 2025
 | 
| Yusheng Su | 7 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | 
| Seungrok Jung | 7 | vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance July 07, 2025
 | Large language model inference optimizations on AMD GPUs March 15, 2024
 | 
| Dong Li | 7 | Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation October 24, 2025
 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | 
| Ossian O'Reilly | 7 | Seismic stencil codes - part 1 August 29, 2024
 | Finite difference method - Laplacian part 1 November 14, 2022
 | 
| Ximeng Sun | 6 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | 
| Ze Wang | 6 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | 
| Jiang Liu | 6 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | 
| Jialian Wu | 6 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | 
| Xiaodong Yu | 6 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | 
| Karan Verma | 6 | Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance September 09, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025
 | 
| Marco Grond | 6 | Elevating 3D Scene Rendering with GSplat October 03, 2025
 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025
 | 
| Gowtham Ramesh | 5 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | 
| Peng Sun | 5 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X March 21, 2025
 | 
| Shekhar Pandey | 5 | Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware August 05, 2025
 | Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide March 14, 2025
 | 
| Xuanwu Yin | 5 | Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More October 14, 2025
 | Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning August 22, 2025
 | 
| Liam Berry | 5 | ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity September 16, 2025
 | ROCm Runfile Installer Is Here! May 22, 2025
 | 
| Sean Miller | 5 | Finite difference method - Laplacian part 4 July 18, 2023
 | Finite difference method - Laplacian part 1 November 14, 2022
 | 
| Rajat Arora | 5 | Jacobi Solver with HIP and OpenMP offloading September 15, 2023
 | Finite difference method - Laplacian part 1 November 14, 2022
 | 
| Matt Elliott | 5 | How to Build a vLLM Container for Inference and Benchmarking February 21, 2025
 | Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in high-security environments and air-gapped networks. September 10, 2024
 | 
| Alessandro Fanfarillo | 5 | Performance Profiling on AMD GPUs - Part 3: Advanced Usage October 23, 2025
 | Register pressure in AMD CDNA™2 GPUs May 17, 2023
 | 
| Asitav Mishra | 5 | Performance Profiling on AMD GPUs - Part 3: Advanced Usage October 23, 2025
 | Jacobi Solver with HIP and OpenMP offloading September 15, 2023
 | 
| Prakamya Mishra | 4 | Introducing Instella-Math: Fully Open Language Model with Reasoning Capability August 09, 2025
 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | 
| Sudhanshu Ranjan | 4 | Introducing Instella-Math: Fully Open Language Model with Reasoning Capability August 09, 2025
 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | 
| Wei-Ting Liao | 4 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025
 | 
| Poovaiah Palangappa | 4 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025
 | 
| Vikas C Sajjan | 4 | Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs October 07, 2025
 | Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing July 18, 2025
 | 
| Logan Grado | 4 | Accelerating models on ROCm using PyTorch TunableOp July 03, 2024
 | Automatic mixed precision in PyTorch using AMD GPUs March 29, 2024
 | 
| Yixing Xu | 4 | Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More October 14, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Mohammed Faraaz Mustafa | 4 | ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity September 16, 2025
 | ROCm Revisited: Getting Started with HIP June 06, 2025
 | 
| Jayacharan Kolla | 4 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025
 | Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X March 02, 2025
 | 
| Ning Zhang | 3 | Step-3 Deployment Simplified: A Day 0 Developer’s Guide on AMD Instinct™ GPUs September 04, 2025
 | GEMM Kernel Optimization For AMD GPUs February 06, 2025
 | 
| Pratik Prabhanjan Brahma | 3 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | 
| Ravi Dwivedula | 3 | Reproduce AMD's MLPerf Training v5.0 Submission Result with Instinct™ GPUs June 04, 2025
 | High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide June 03, 2025
 | 
| Su Ann Chong | 3 | Reproduce AMD's MLPerf Training v5.0 Submission Result with Instinct™ GPUs June 04, 2025
 | High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide June 03, 2025
 | 
| Sarthak Arora | 3 | Reproduce AMD's MLPerf Training v5.0 Submission Result with Instinct™ GPUs June 04, 2025
 | High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide June 03, 2025
 | 
| Sathish Sanjeevi | 3 | Reproduce AMD's MLPerf Training v5.0 Submission Result with Instinct™ GPUs June 04, 2025
 | High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide June 03, 2025
 | 
| Victor Robles | 3 | AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3 March 13, 2025
 | AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1 February 07, 2025
 | 
| Mahdi Ghodsi | 3 | Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware August 05, 2025
 | Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart April 28, 2025
 | 
| Vicky Tsang | 3 | Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows September 10, 2025
 | Scale AI applications with Ray April 01, 2024
 | 
| Alex He | 3 | A Step-by-Step Guide On How To Deploy Llama Stack on AMD Instinct™ GPU April 22, 2025
 | Navigating vLLM Inference with ROCm and Kubernetes February 13, 2025
 | 
| Charles Yang | 3 | Optimizing FP4 Mixed-Precision Inference with Petit on AMD Instinct MI250 and MI300 GPUs: A Developer’s Perspective October 06, 2025
 | Vibe Coding Pac-Man Inspired Game with DeepSeek-R1 and AMD Instinct MI300X July 17, 2025
 | 
| Soumitra Chatterjee | 3 | Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs October 07, 2025
 | Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing July 18, 2025
 | 
| Anik Chaudhuri | 3 | Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs October 07, 2025
 | Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing July 18, 2025
 | 
| Dominic Widdows | 3 | ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System October 20, 2025
 | Benchmarking Reasoning Models: From Tokens to Answers July 24, 2025
 | 
| Rui Sampaio | 3 | Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection October 07, 2025
 | Optimizing Drug Discovery Tools on AMD MI300X Part 1: Molecular Design with REINVENT September 19, 2025
 | 
| Wei Luo | 3 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | QuickReduce: Up to 3x Faster All-reduce for  vLLM and SGLang August 26, 2025
 | 
| Spandan Tiwari | 3 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | QuickReduce: Up to 3x Faster All-reduce for  vLLM and SGLang August 26, 2025
 | 
| Mukhil Azhagan Mallaiyan Sathiaseelan | 3 | Enabling FlashInfer on ROCm for Accelerated LLM Serving October 01, 2025
 | Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware July 31, 2025
 | 
| Anuya Welling | 3 | From Ingestion to Inference: RAG Pipelines on AMD GPUs October 02, 2025
 | Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware July 31, 2025
 | 
| Hongxia Yang | 3 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm June 28, 2025
 | 
| Ean Garvey | 3 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025
 | 
| Kumar Deepak | 3 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025
 | 
| Fulu Li | 3 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Zhe Li | 3 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Guanchen Li | 3 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Farshad Ghodsian | 3 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025
 | Announcing the AMD GPU Operator and Metrics Exporter January 29, 2025
 | 
| Yao Fu | 3 | An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs September 19, 2025
 | Optimized ROCm Docker for Distributed AI Training March 13, 2025
 | 
| Suyash Tandon | 3 | MI300A - Exploring the APU advantage February 09, 2025
 | Introduction to profiling tools for AMD hardware April 12, 2023
 | 
| Bob Robey | 3 | Application portability with HIP April 26, 2024
 | Affinity part 2 - System topology and controlling affinity April 16, 2024
 | 
| David Li | 3 | Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework July 25, 2025
 | Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs April 15, 2025
 | 
| Karthik Kashyap Thatipamula | 3 | Elevating 3D Scene Rendering with GSplat October 03, 2025
 | Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing July 18, 2025
 | 
| Deeksha Goplani | 3 | Elevating 3D Scene Rendering with GSplat October 03, 2025
 | Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing July 18, 2025
 | 
| Ish Kool | 3 | Elevating 3D Scene Rendering with GSplat October 03, 2025
 | Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing July 18, 2025
 | 
| Luka Stanisic | 3 | Performance Profiling on AMD GPUs - Part 3: Advanced Usage October 23, 2025
 | Performance Profiling on AMD GPUs – Part 1: Foundations June 26, 2025
 | 
| Giacomo Capodaglio | 3 | Performance Profiling on AMD GPUs - Part 3: Advanced Usage October 23, 2025
 | Performance Profiling on AMD GPUs – Part 1: Foundations June 26, 2025
 | 
| Vikram Appia | 2 | AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models September 17, 2025
 | Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs April 28, 2025
 | 
| Hai Xiao | 2 | Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X March 21, 2025
 | SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs November 13, 2024
 | 
| Sonali Singh | 2 | Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding March 27, 2025
 | Deep dive into the MI300 compute and memory partition modes February 09, 2025
 | 
| Karthik Sangaiah | 2 | Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding March 27, 2025
 | Deep dive into the MI300 compute and memory partition modes February 09, 2025
 | 
| Ryan Swann | 2 | Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding March 27, 2025
 | Deep dive into the MI300 compute and memory partition modes February 09, 2025
 | 
| Ganesh Dasika | 2 | Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding March 27, 2025
 | Deep dive into the MI300 compute and memory partition modes February 09, 2025
 | 
| Tiffany Mintz | 2 | Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs July 31, 2025
 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025
 | 
| David Björelind | 2 | Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection October 07, 2025
 | Optimizing Drug Discovery Tools on AMD MI300X Part 1: Molecular Design with REINVENT September 19, 2025
 | 
| Lin Sun | 2 | From Ingestion to Inference: RAG Pipelines on AMD GPUs October 02, 2025
 | Coding Agents on AMD GPUs: Fast LLM Pipelines for Developers September 30, 2025
 | 
| Johanna Yang | 2 | Accelerating Audio-Driven Video Generation: WAN2.2-S2V on AMD ROCm September 24, 2025
 | All-in-One Video Editing with VACE on AMD Instinct GPUs August 19, 2025
 | 
| Kristoffer Peyron | 2 | Accelerating Audio-Driven Video Generation: WAN2.2-S2V on AMD ROCm September 24, 2025
 | All-in-One Video Editing with VACE on AMD Instinct GPUs August 19, 2025
 | 
| Wei Cai | 2 | Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs October 16, 2025
 | Step-Video-T2V Inference with xDiT on AMD Instinct  MI300X GPUs May 15, 2025
 | 
| Fan Wu | 2 | Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs October 16, 2025
 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025
 | 
| Rishi Madduri | 2 | Enabling FlashInfer on ROCm for Accelerated LLM Serving October 01, 2025
 | Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs March 23, 2025
 | 
| Michael Zhang | 2 | SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs November 13, 2024
 | CTranslate2: Efficient Inference with Transformer Models on AMD GPUs October 24, 2024
 | 
| Haoyang Li | 2 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | QuickReduce: Up to 3x Faster All-reduce for  vLLM and SGLang August 26, 2025
 | 
| Xinjun Niu | 2 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | QuickReduce: Up to 3x Faster All-reduce for  vLLM and SGLang August 26, 2025
 | 
| Ke Wang | 2 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | QuickReduce: Up to 3x Faster All-reduce for  vLLM and SGLang August 26, 2025
 | 
| Ashish Sirasao | 2 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | QuickReduce: Up to 3x Faster All-reduce for  vLLM and SGLang August 26, 2025
 | 
| Hao Chen | 2 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs July 15, 2025
 | 
| Tong Shen | 2 | Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation October 24, 2025
 | Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day July 09, 2025
 | 
| Jingai Yu | 2 | Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation October 24, 2025
 | Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day July 09, 2025
 | 
| Sopiko Kurdadze | 2 | A Simple Design for Serving Video Generation Models with Distributed Inference September 24, 2025
 | Accelerating FastVideo on AMD GPUs with TeaCache August 19, 2025
 | 
| Vasumathi Neralla | 2 | Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection October 07, 2025
 | Optimizing Drug Discovery Tools on AMD MI300s Part 2: 3D Molecular Generation with SemlaFlow October 03, 2025
 | 
| Albin Toft | 2 | A Simple Design for Serving Video Generation Models with Distributed Inference September 24, 2025
 | Running ComfyUI on AMD Instinct August 19, 2025
 | 
| Uma Kannikanti | 2 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Neha Mathews | 2 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Bowen Bao | 2 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Danny Guan | 2 | ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity September 16, 2025
 | ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver April 11, 2025
 | 
| Aditya Bhattacharji | 2 | ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity September 16, 2025
 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025
 | 
| Zhenyu Gu | 2 | An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs September 19, 2025
 | Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware August 05, 2025
 | 
| Maria Ruiz Varela | 2 | Application portability with HIP April 26, 2024
 | AMD Instinct™ MI200 GPU memory space overview March 09, 2023
 | 
| Xiaobo Chen | 2 | An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs September 19, 2025
 | Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs August 22, 2025
 | 
| Muhammad Osama | 2 | Deep dive into the MI300 compute and memory partition modes February 09, 2025
 | Graph analytics on AMD GPUs using Gunrock July 29, 2024
 | 
| Damon McDougall | 2 | GPU-aware MPI with ROCm June 08, 2023
 | AMD matrix cores November 14, 2022
 | 
| Noel Chalmers | 2 | GPU-aware MPI with ROCm June 08, 2023
 | AMD matrix cores November 14, 2022
 | 
| Ammar Elwazir | 2 | Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling March 25, 2025
 | Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling. March 25, 2025
 | 
| Haocong Wang | 2 | Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework July 25, 2025
 | From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile May 21, 2025
 | 
| Ben Sander | 2 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025
 | Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1 February 14, 2025
 | 
| Carlus Huang | 2 | Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture September 30, 2025
 | AITER: AI Tensor Engine For ROCm March 21, 2025
 | 
| YangWen Huang | 2 | GEMM Tuning within hipBLASLt– Part 2 October 09, 2025
 | GEMM Tuning within hipBLASLt - Part 1 September 05, 2025
 | 
| Carson Liao | 2 | GEMM Tuning within hipBLASLt– Part 2 October 09, 2025
 | GEMM Tuning within hipBLASLt - Part 1 September 05, 2025
 | 
| George Markomanolis | 2 | Affinity part 1 - Affinity, placement, and order April 16, 2024
 | Affinity part 2 - System topology and controlling affinity April 16, 2024
 | 
| Chang Liu | 2 | Efficient LLM Serving with MTP: DeepSeek V3 and SGLang on AMD Instinct GPUs September 11, 2025
 | Speculative Decoding - Deep Dive March 24, 2025
 | 
| Justin Chu | 1 | STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm October 23, 2025
 | STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm October 23, 2025
 | 
| Yosi Hatekar | 1 | STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm October 23, 2025
 | STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm October 23, 2025
 | 
| Vivian Cheng | 1 | STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm October 23, 2025
 | STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm October 23, 2025
 | 
| Hyunji Kim | 1 | STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm October 23, 2025
 | STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm October 23, 2025
 | 
| Alex Bogdan | 1 | STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm October 23, 2025
 | STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm October 23, 2025
 | 
| Aditya Kumar Singh | 1 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025
 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025
 | 
| Mehdi Rezagholizadeh | 1 | AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models September 17, 2025
 | AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models September 17, 2025
 | 
| Mingyu Yang | 1 | AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models September 17, 2025
 | AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models September 17, 2025
 | 
| Guihong Li | 1 | AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models September 17, 2025
 | AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models September 17, 2025
 | 
| Shenrun Zhang | 1 | Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding March 27, 2025
 | Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding March 27, 2025
 | 
| Bill He | 1 | Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart April 28, 2025
 | Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart April 28, 2025
 | 
| Kenny Roche | 1 | AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving May 20, 2025
 | AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving May 20, 2025
 | 
| Joe Shajrawi | 1 | AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving May 20, 2025
 | AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving May 20, 2025
 | 
| AMD Quark Team | 1 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025
 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025
 | 
| AMD Brevitas Team | 1 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025
 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025
 | 
| and AMD Shark Team | 1 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025
 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025
 | 
| Haohui Mai | 1 | Optimizing FP4 Mixed-Precision Inference with Petit on AMD Instinct MI250 and MI300 GPUs: A Developer’s Perspective October 06, 2025
 | Optimizing FP4 Mixed-Precision Inference with Petit on AMD Instinct MI250 and MI300 GPUs: A Developer’s Perspective October 06, 2025
 | 
| Chandan Sharma | 1 | Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs October 07, 2025
 | Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs October 07, 2025
 | 
| Hui Liu | 1 | SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs November 13, 2024
 | SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs November 13, 2024
 | 
| Yineng Zhang | 1 | SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs November 13, 2024
 | SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs November 13, 2024
 | 
| Jiangyong Ren | 1 | QuickReduce: Up to 3x Faster All-reduce for  vLLM and SGLang August 26, 2025
 | QuickReduce: Up to 3x Faster All-reduce for  vLLM and SGLang August 26, 2025
 | 
| Doug Lehr | 1 | QuickReduce: Up to 3x Faster All-reduce for  vLLM and SGLang August 26, 2025
 | QuickReduce: Up to 3x Faster All-reduce for  vLLM and SGLang August 26, 2025
 | 
| Luise Chen | 1 | Inferencing with Grok-1 on AMD GPUs August 09, 2024
 | Inferencing with Grok-1 on AMD GPUs August 09, 2024
 | 
| Lei Shao | 1 | Inferencing with Grok-1 on AMD GPUs August 09, 2024
 | Inferencing with Grok-1 on AMD GPUs August 09, 2024
 | 
| Yuzhen Zhou | 1 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | 
| Jin Pan | 1 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | Day-0 Support for the SGLang-Native RL Framework - slime on AMD Instinct™ GPUs September 25, 2025
 | 
| Dong Zhou | 1 | Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation October 24, 2025
 | Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation October 24, 2025
 | 
| James E. T. Smith | 1 | DGL in the Real World: Running GNNs on Real Use Cases August 20, 2025
 | DGL in the Real World: Running GNNs on Real Use Cases August 20, 2025
 | 
| Elaine Zosa | 1 | Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation June 18, 2025
 | Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation June 18, 2025
 | 
| Jouni Luoma | 1 | Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation June 18, 2025
 | Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation June 18, 2025
 | 
| Kai Hakala | 1 | Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation June 18, 2025
 | Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation June 18, 2025
 | 
| Antti Virtanen | 1 | Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation June 18, 2025
 | Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation June 18, 2025
 | 
| Mika Koistinen | 1 | Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation June 18, 2025
 | Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation June 18, 2025
 | 
| Jonathan Burdge | 1 | Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation June 18, 2025
 | Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation June 18, 2025
 | 
| Takashi Isobe | 1 | AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation August 03, 2025
 | AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation August 03, 2025
 | 
| Dong zhou | 1 | AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation August 03, 2025
 | AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation August 03, 2025
 | 
| He Cui,Mengmeng Ge,Dong Li,Emad Barsoum | 1 | AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation August 03, 2025
 | AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation August 03, 2025
 | 
| Jorge Parada | 1 | Scale LLM Inference with Multi-Node Infrastructure May 30, 2025
 | Scale LLM Inference with Multi-Node Infrastructure May 30, 2025
 | 
| Jeremy Arnold | 1 | Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission August 28, 2024
 | Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission August 28, 2024
 | 
| Jassani Adeem | 1 | Mamba on AMD GPUs with ROCm June 28, 2024
 | Mamba on AMD GPUs with ROCm June 28, 2024
 | 
| Moskvichev Arseny | 1 | Mamba on AMD GPUs with ROCm June 28, 2024
 | Mamba on AMD GPUs with ROCm June 28, 2024
 | 
| Akash Haridas | 1 | Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day July 09, 2025
 | Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day July 09, 2025
 | 
| Nick Romero | 1 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | 
| Jeff Daily | 1 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | 
| Jithun Nair | 1 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | 
| Pruthvi Madugundu | 1 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | 
| Jagadish Krishnamoorthy | 1 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | 
| Srinivasan Subramanian | 1 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | 
| Eli Uriegas | 1 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring October 21, 2025
 | 
| Giuseppe Franco | 1 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025
 | 
| AMD Quark team | 1 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025
 | 
| Arttu Niemela | 1 | Wan2.2 Fine-Tuning: Tailoring an Advanced Video Generation Model on a Single GPU August 19, 2025
 | Wan2.2 Fine-Tuning: Tailoring an Advanced Video Generation Model on a Single GPU August 19, 2025
 | 
| Balazs Toth | 1 | Wan2.2 Fine-Tuning: Tailoring an Advanced Video Generation Model on a Single GPU August 19, 2025
 | Wan2.2 Fine-Tuning: Tailoring an Advanced Video Generation Model on a Single GPU August 19, 2025
 | 
| Joaquin Rives Gambin | 1 | Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection October 07, 2025
 | Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection October 07, 2025
 | 
| Rajesh Poornachandran | 1 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Zhao Lin | 1 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Niels Zhang | 1 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Vinayak Gokhale | 1 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | Technical Dive into AMD's MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Zhenhua Liu | 1 | Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning August 22, 2025
 | Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning August 22, 2025
 | 
| Bruce Xue | 1 | Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang May 16, 2025
 | Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang May 16, 2025
 | 
| Rathnakara Malatesha | 1 | Deploying Serverless AI Inference on AMD GPU Clusters February 25, 2025
 | Deploying Serverless AI Inference on AMD GPU Clusters February 25, 2025
 | 
| Abby O'Neill | 1 | Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot July 14, 2025
 | Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot July 14, 2025
 | 
| Sarunas Kalade | 1 | Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot July 14, 2025
 | Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot July 14, 2025
 | 
| Ken O'Brien | 1 | Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot July 14, 2025
 | Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot July 14, 2025
 | 
| Graham Schelle | 1 | Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot July 14, 2025
 | Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot July 14, 2025
 | 
| Chaitanya Manem | 1 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025
 | 
| Luka Tsabadze | 1 | Running SOTA AI-based Weather Forecasting models on AMD Instinct September 18, 2025
 | Running SOTA AI-based Weather Forecasting models on AMD Instinct September 18, 2025
 | 
| Rahul Biswas | 1 | Running SOTA AI-based Weather Forecasting models on AMD Instinct September 18, 2025
 | Running SOTA AI-based Weather Forecasting models on AMD Instinct September 18, 2025
 | 
| Pauli Pihajoki | 1 | Running SOTA AI-based Weather Forecasting models on AMD Instinct September 18, 2025
 | Running SOTA AI-based Weather Forecasting models on AMD Instinct September 18, 2025
 | 
| Daniel Warna | 1 | Running SOTA AI-based Weather Forecasting models on AMD Instinct September 18, 2025
 | Running SOTA AI-based Weather Forecasting models on AMD Instinct September 18, 2025
 | 
| Baiqiang Xia | 1 | Running SOTA AI-based Weather Forecasting models on AMD Instinct September 18, 2025
 | Running SOTA AI-based Weather Forecasting models on AMD Instinct September 18, 2025
 | 
| Ted Themistokleous | 1 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025
 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025
 | 
| Brian Pickrell | 1 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025
 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025
 | 
| Diptorup Deb | 1 | Enabling FlashInfer on ROCm for Accelerated LLM Serving October 01, 2025
 | Enabling FlashInfer on ROCm for Accelerated LLM Serving October 01, 2025
 | 
| Debasis Mandal | 1 | Enabling FlashInfer on ROCm for Accelerated LLM Serving October 01, 2025
 | Enabling FlashInfer on ROCm for Accelerated LLM Serving October 01, 2025
 | 
| Eduardo Alvarez | 1 | Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance March 14, 2025
 | Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance March 14, 2025
 | 
| Yu Wang | 1 | AMD Advances Enterprise AI Through OPEA Integration March 12, 2025
 | AMD Advances Enterprise AI Through OPEA Integration March 12, 2025
 | 
| Yamini Kamisetty | 1 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission September 09, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Chelsea Iluno | 1 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission September 09, 2025
 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission September 09, 2025
 | 
| Benran Hu | 1 | Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs July 15, 2025
 | Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs July 15, 2025
 | 
| Marilyn Basanta | 1 | ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity September 16, 2025
 | ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity September 16, 2025
 | 
| Ronnie Chatterjee | 1 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025
 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025
 | 
| Christophe Paquot | 1 | HIP 7.0 Is Coming: What You Need to Know to Stay Ahead May 28, 2025
 | HIP 7.0 Is Coming: What You Need to Know to Stay Ahead May 28, 2025
 | 
| Julia Jiang | 1 | HIP 7.0 Is Coming: What You Need to Know to Stay Ahead May 28, 2025
 | HIP 7.0 Is Coming: What You Need to Know to Stay Ahead May 28, 2025
 | 
| Denny Iriawan | 1 | HIP 7.0 Is Coming: What You Need to Know to Stay Ahead May 28, 2025
 | HIP 7.0 Is Coming: What You Need to Know to Stay Ahead May 28, 2025
 | 
| Brian Cornille | 1 | Introducing AMD's Next-Gen Fortran Compiler November 13, 2024
 | Introducing AMD's Next-Gen Fortran Compiler November 13, 2024
 | 
| Michael Klemm | 1 | Introducing AMD's Next-Gen Fortran Compiler November 13, 2024
 | Introducing AMD's Next-Gen Fortran Compiler November 13, 2024
 | 
| Johanna Potyka | 1 | Introducing AMD's Next-Gen Fortran Compiler November 13, 2024
 | Introducing AMD's Next-Gen Fortran Compiler November 13, 2024
 | 
| Martin Huarte | 1 | Boosting Computational Fluid Dynamics Performance with AMD Instinct™ MI300X January 14, 2025
 | Boosting Computational Fluid Dynamics Performance with AMD Instinct™ MI300X January 14, 2025
 | 
| Quentin Anthony | 1 | Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators December 10, 2024
 | Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators December 10, 2024
 | 
| Niles Burbank | 1 | Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware August 05, 2025
 | Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware August 05, 2025
 | 
| Kailash Gogineni,Xun Wang | 1 | Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware August 05, 2025
 | Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware August 05, 2025
 | 
| Yanyuan Qin | 1 | Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware August 05, 2025
 | Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware August 05, 2025
 | 
| Deepan Sekar | 1 | Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration September 09, 2025
 | Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration September 09, 2025
 | 
| Pei Zhang | 1 | Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration September 09, 2025
 | Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration September 09, 2025
 | 
| Hyukjoon Lee | 1 | vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance July 07, 2025
 | vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance July 07, 2025
 | 
| Janet Tseng | 1 | ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System October 20, 2025
 | ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System October 20, 2025
 | 
| Scott Todd | 1 | ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System October 20, 2025
 | ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System October 20, 2025
 | 
| Chris Sosa | 1 | ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System October 20, 2025
 | ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System October 20, 2025
 | 
| Wen Xie ,Yao Fu | 1 | Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs August 22, 2025
 | Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs August 22, 2025
 | 
| Xiaoming Peng | 1 | Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs August 22, 2025
 | Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs August 22, 2025
 | 
| Vidushi Goyal | 1 | Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs August 22, 2025
 | Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs August 22, 2025
 | 
| Nicholas Curtis | 1 | Register pressure in AMD CDNA™2 GPUs May 17, 2023
 | Register pressure in AMD CDNA™2 GPUs May 17, 2023
 | 
| Lin Zhao | 1 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | 
| Felix Marty | 1 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | 
| Zhaofeng Zhang | 1 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs October 29, 2025
 | 
| Rajneesh Bhardwaj | 1 | Deep dive into the MI300 compute and memory partition modes February 09, 2025
 | Deep dive into the MI300 compute and memory partition modes February 09, 2025
 | 
| Anton Smirnov | 1 | Programming AMD GPUs with Julia April 16, 2024
 | Programming AMD GPUs with Julia April 16, 2024
 | 
| Matthias Reso | 1 | Chain-of-Thought Guided Visual Reasoning Using Llama 3.2 on a Single AMD Instinct MI300X GPU July 21, 2025
 | Chain-of-Thought Guided Visual Reasoning Using Llama 3.2 on a Single AMD Instinct MI300X GPU July 21, 2025
 | 
| Mohammad Mahdi Kamani | 1 | Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs April 28, 2025
 | Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs April 28, 2025
 | 
| Parsa Fashi | 1 | Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs April 28, 2025
 | Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs April 28, 2025
 | 
| David Doscher | 1 | AMD ROCm™ installation January 26, 2023
 | AMD ROCm™ installation January 26, 2023
 | 
| Rene Van Oostrum | 1 | AMD matrix cores November 14, 2022
 | AMD matrix cores November 14, 2022
 | 
| Nicholas Malaya | 1 | AMD matrix cores November 14, 2022
 | AMD matrix cores November 14, 2022
 | 
| Daniel Huang | 1 | AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs August 25, 2025
 | AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs August 25, 2025
 | 
| Yao Fehlis | 1 | Creating a PyTorch/TensorFlow code environment on AMD GPUs September 11, 2023
 | Creating a PyTorch/TensorFlow code environment on AMD GPUs September 11, 2023
 | 
| Warren Eng | 1 | Running ComfyUI in Windows with ROCm on WSL August 07, 2025
 | Running ComfyUI in Windows with ROCm on WSL August 07, 2025
 | 
| Wen Xie | 1 | An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs September 19, 2025
 | An Introduction to Primus-Turbo: A Library for Accelerating Transformer Models on AMD GPUs September 19, 2025
 | 
| Clement Lin | 1 | Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework July 25, 2025
 | Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework July 25, 2025
 | 
| Meng-Hsuan Yang | 1 | Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework July 25, 2025
 | Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework July 25, 2025
 | 
| Yu-Chen Lin | 1 | Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework July 25, 2025
 | Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework July 25, 2025
 | 
| Bobo Fang | 1 | Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework July 25, 2025
 | Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework July 25, 2025
 | 
| Chun-Hung Wang | 1 | Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework July 25, 2025
 | Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework July 25, 2025
 | 
| Noah Wolfe | 1 | Introduction to profiling tools for AMD hardware April 12, 2023
 | Introduction to profiling tools for AMD hardware April 12, 2023
 | 
| Cheng Ling | 1 | SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel May 31, 2024
 | SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel May 31, 2024
 | 
| Pedram Alizadeh | 1 | Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X March 02, 2025
 | Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X March 02, 2025
 | 
| Gilbert Lee | 1 | Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X March 02, 2025
 | Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X March 02, 2025
 | 
| Douglas Hamilton | 1 | ROCm Runfile Installer Is Here! May 22, 2025
 | ROCm Runfile Installer Is Here! May 22, 2025
 | 
| Lei Zhang | 1 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025
 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025
 | 
| Kyle Wang | 1 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025
 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025
 | 
| Lingpeng Jin | 1 | AITER: AI Tensor Engine For ROCm March 21, 2025
 | AITER: AI Tensor Engine For ROCm March 21, 2025
 | 
| Bill He,Andy Luo | 1 | Unleashing AMD Instinct™ MI300X GPUs for LLM Serving: Disaggregating Prefill & Decode with SGLang August 28, 2025
 | Unleashing AMD Instinct™ MI300X GPUs for LLM Serving: Disaggregating Prefill & Decode with SGLang August 28, 2025
 | 
| Chia Hung | 1 | GEMM Tuning within hipBLASLt– Part 2 October 09, 2025
 | GEMM Tuning within hipBLASLt– Part 2 October 09, 2025
 | 
| Kevin Chang | 1 | From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile May 21, 2025
 | From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile May 21, 2025
 | 
| Garrett Byrd | 1 | Installing ROCm from source with Spack April 14, 2025
 | Installing ROCm from source with Spack April 14, 2025
 | 
| Joseph Schoonover | 1 | Installing ROCm from source with Spack April 14, 2025
 | Installing ROCm from source with Spack April 14, 2025
 | 
| Mark Granroth-Wilding | 1 | Elevating 3D Scene Rendering with GSplat October 03, 2025
 | Elevating 3D Scene Rendering with GSplat October 03, 2025
 | 
| Pier Luigi Dovesi | 1 | Elevating 3D Scene Rendering with GSplat October 03, 2025
 | Elevating 3D Scene Rendering with GSplat October 03, 2025
 | 
| Shaghayegh Roohi | 1 | Elevating 3D Scene Rendering with GSplat October 03, 2025
 | Elevating 3D Scene Rendering with GSplat October 03, 2025
 | 
| Amanzhol Salykov | 1 | Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture September 30, 2025
 | Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture September 30, 2025
 | 
| Jinze Li | 1 | Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More October 14, 2025
 | Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More October 14, 2025
 | 
| Abhishek Patil | 1 | Unlocking GPU-Accelerated Containers with the AMD Container Toolkit July 03, 2025
 | Unlocking GPU-Accelerated Containers with the AMD Container Toolkit July 03, 2025
 | 
| Mahdieh Ghazimirsaeed | 1 | GPU-aware MPI with ROCm June 08, 2023
 | GPU-aware MPI with ROCm June 08, 2023
 | 
| Evan Masters | 1 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025
 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025
 | 
| Babak Poursartip | 1 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025
 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025
 | 
| Henry Ho | 1 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025
 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025
 | 
| Jianghui Wang | 1 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | 
| Vinay Joshi | 1 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | 
| Saptarshi Majumder | 1 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | 
| Chao Xu | 1 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | 
| Bin Ding | 1 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | 
| Ziqiong Liu | 1 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks August 01, 2025
 | 
| Alireza Sariaslani | 1 | GPU Partitioning Made Easy: Pack More AI Workloads Using AMD GPU Operator October 01, 2025
 | GPU Partitioning Made Easy: Pack More AI Workloads Using AMD GPU Operator October 01, 2025
 | 
| Zhu Shan | 1 | Fine-Tuning LLMs with GRPO on AMD MI300X: Scalable RLHF with Hugging Face TRL and ROCm June 18, 2025
 | Fine-Tuning LLMs with GRPO on AMD MI300X: Scalable RLHF with Hugging Face TRL and ROCm June 18, 2025
 | 
| Corbin Robeck | 1 | Reading AMD GPU ISA May 13, 2024
 | Reading AMD GPU ISA May 13, 2024
 | 
| Tun Jian Tan | 1 | Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm June 28, 2025
 | Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm June 28, 2025
 | 
| Pin Siang Tan | 1 | Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm June 28, 2025
 | Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm June 28, 2025
 | 
| Alex Voicu | 1 | C++17 parallel algorithms and HIPSTDPAR # April 18, 2024
 | C++17 parallel algorithms and HIPSTDPAR # April 18, 2024
 | 
| Paul Mullowney | 1 | Sparse matrix vector multiplication - part 1 November 03, 2023
 | Sparse matrix vector multiplication - part 1 November 03, 2023
 |