Douglas Jia | 14 | Multinode Fine-Tuning of Stable Diffusion XL on AMD GPUs with Hugging Face Accelerate and OCI's Kubernetes Engine (OKE) October 15, 2024 | Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs January 24, 2024 |
Sean Song | 14 | PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm February 09, 2025 | Fine-tune Llama model with LoRA: Customizing a large language model for question-answering February 01, 2024 |
Fabricio Flores | 13 | CuPy and hipDF on AMD: The Basics and Beyond May 06, 2025 | Building semantic search with SentenceTransformers on AMD April 04, 2024 |
Phillip Dang | 13 | DBRX Instruct on AMD GPUs July 11, 2024 | Simplifying deep learning: A guide to PyTorch Lightning February 08, 2024 |
Vara Lakshmi Bayanagari | 11 | Distributed fine-tuning of MPT-30B using Composer on AMD GPUs January 28, 2025 | Pre-training BERT using Hugging Face & PyTorch on an AMD GPU January 26, 2024 |
Clint Greene | 10 | Enhancing vLLM Inference on AMD GPUs October 11, 2024 | Accelerating XGBoost with Dask using multiple AMD GPUs January 26, 2024 |
Justin Chang | 10 | MI300A - Exploring the APU advantage February 09, 2025 | Finite difference method - Laplacian part 1 November 14, 2022 |
Eliot Li | 8 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025 | Scale AI applications with Ray April 01, 2024 |
Andy Luo | 7 | Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools May 01, 2025 | Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs January 29, 2025 |
Gina Sitaraman | 7 | Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling March 25, 2025 | AMD matrix cores November 14, 2022 |
Ossian O''Reilly | 7 | Seismic stencil codes - part 1 August 29, 2024 | Finite difference method - Laplacian part 1 November 14, 2022 |
Seungrok Jung | 6 | Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools May 01, 2025 | Large language model inference optimizations on AMD GPUs March 15, 2024 |
Liz Li | 6 | Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools May 01, 2025 | AITER: AI Tensor Engine For ROCm March 21, 2025 |
Thomas Gibson | 6 | Graph analytics on AMD GPUs using Gunrock July 29, 2024 | Finite difference method - Laplacian part 1 November 14, 2022 |
Anshul Gupta | 5 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025 | GEMM Kernel Optimization For AMD GPUs February 06, 2025 |
Sean Miller | 5 | Finite difference method - Laplacian part 4 July 18, 2023 | Finite difference method - Laplacian part 1 November 14, 2022 |
Rajat Arora | 5 | Jacobi Solver with HIP and OpenMP offloading September 15, 2023 | Finite difference method - Laplacian part 1 November 14, 2022 |
Matt Elliott | 5 | How to Build a vLLM Container for Inference and Benchmarking February 21, 2025 | Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in high-security environments and air-gapped networks. September 10, 2024 |
George Wang | 4 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025 | GEMM Kernel Optimization For AMD GPUs February 06, 2025 |
Emad Barsoum | 4 | Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs April 28, 2025 | Enhancing AI Training with AMD ROCm Software January 31, 2025 |
Logan Grado | 4 | Accelerating models on ROCm using PyTorch TunableOp July 03, 2024 | Automatic mixed precision in PyTorch using AMD GPUs March 29, 2024 |
Shekhar Pandey | 4 | Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools May 01, 2025 | Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide March 14, 2025 |
Zicheng Liu | 3 | Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration April 24, 2025 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 |
Yusheng Su | 3 | Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration April 24, 2025 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 |
Victor Robles | 3 | AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3 March 13, 2025 | AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1 February 07, 2025 |
Alex He | 3 | A Step-by-Step Guide On How To Deploy Llama Stack on AMD Instinct™ GPU April 22, 2025 | Navigating vLLM Inference with ROCm and Kubernetes February 13, 2025 |
Meena Arunachalam | 3 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025 | Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission August 28, 2024 |
Miro Hodak | 3 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025 | Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission August 28, 2024 |
Yao Liu | 3 | Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration April 24, 2025 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025 |
Jayacharan Kolla | 3 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025 | Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X March 02, 2025 |
Farshad Ghodsian | 3 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025 | Announcing the AMD GPU Operator and Metrics Exporter January 29, 2025 |
ROCm Blogs Team | 3 | Stone Ridge Expands Reservoir Simulation Options with AMD Instinct™ Accelerators June 10, 2024 | Siemens taps AMD Instinct™ GPUs to expand high-performance hardware options for Simcenter STAR-CCM+ May 16, 2024 |
Suyash Tandon | 3 | MI300A - Exploring the APU advantage February 09, 2025 | Introduction to profiling tools for AMD hardware April 12, 2023 |
Bob Robey | 3 | Application portability with HIP April 26, 2024 | Affinity part 2 - System topology and controlling affinity April 16, 2024 |
Ning Zhang | 2 | Unlock Peak Performance on AMD GPUs with Triton Kernel Optimizations April 10, 2025 | GEMM Kernel Optimization For AMD GPUs February 06, 2025 |
Ximeng Sun | 2 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 |
Gowtham Ramesh | 2 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 |
Pratik Prabhanjan Brahma | 2 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 |
Ze Wang | 2 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 |
Jiang Liu | 2 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 |
Jialian Wu | 2 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 |
Prakamya Mishra | 2 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 |
Xiaodong Yu | 2 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 |
Sudhanshu Ranjan | 2 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 |
Peng Sun | 2 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025 | Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X March 21, 2025 |
Hai Xiao | 2 | Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X March 21, 2025 | SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs November 13, 2024 |
Sonali Singh | 2 | Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding March 27, 2025 | Deep dive into the MI300 compute and memory partition modes February 09, 2025 |
Karthik Sangaiah | 2 | Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding March 27, 2025 | Deep dive into the MI300 compute and memory partition modes February 09, 2025 |
Ryan Swann | 2 | Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding March 27, 2025 | Deep dive into the MI300 compute and memory partition modes February 09, 2025 |
Ganesh Dasika | 2 | Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding March 27, 2025 | Deep dive into the MI300 compute and memory partition modes February 09, 2025 |
Wei-Ting Liao | 2 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025 |
Michael Zhang | 2 | SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs November 13, 2024 | CTranslate2: Efficient Inference with Transformer Models on AMD GPUs October 24, 2024 |
Vicky Tsang | 2 | Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration April 24, 2025 | Scale AI applications with Ray April 01, 2024 |
Saad Rahim | 2 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025 | ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver April 11, 2025 |
Maria Ruiz Varela | 2 | Application portability with HIP April 26, 2024 | AMD Instinct™ MI200 GPU memory space overview March 09, 2023 |
Alessandro Fanfarillo | 2 | C++17 parallel algorithms and HIPSTDPAR # April 18, 2024 | Register pressure in AMD CDNA™2 GPUs May 17, 2023 |
Muhammad Osama | 2 | Deep dive into the MI300 compute and memory partition modes February 09, 2025 | Graph analytics on AMD GPUs using Gunrock July 29, 2024 |
Damon McDougall | 2 | GPU-aware MPI with ROCm June 08, 2023 | AMD matrix cores November 14, 2022 |
Noel Chalmers | 2 | GPU-aware MPI with ROCm June 08, 2023 | AMD matrix cores November 14, 2022 |
Ben Sander | 2 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025 | Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1 February 14, 2025 |
George Markomanolis | 2 | Affinity part 1 - Affinity, placement, and order April 16, 2024 | Affinity part 2 - System topology and controlling affinity April 16, 2024 |
Asitav Mishra | 2 | Reading AMD GPU ISA May 13, 2024 | Jacobi Solver with HIP and OpenMP offloading September 15, 2023 |
Aditya Kumar Singh | 1 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025 | Instella-VL-1B: First AMD Vision Language Model March 07, 2025 |
Shenrun Zhang | 1 | Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding March 27, 2025 | Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding March 27, 2025 |
Bill He | 1 | Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart April 28, 2025 | Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart April 28, 2025 |
Mahdi Ghodsi | 1 | Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart April 28, 2025 | Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart April 28, 2025 |
Poovaiah Palangappa | 1 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025 |
AMD Quark Team | 1 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025 |
AMD Brevitas Team | 1 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025 |
and AMD Shark Team | 1 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025 | AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0 April 02, 2025 |
Rishi Madduri | 1 | Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs March 23, 2025 | Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs March 23, 2025 |
Hui Liu | 1 | SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs November 13, 2024 | SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs November 13, 2024 |
Yineng Zhang | 1 | SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs November 13, 2024 | SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs November 13, 2024 |
Luise Chen | 1 | Inferencing with Grok-1 on AMD GPUs August 09, 2024 | Inferencing with Grok-1 on AMD GPUs August 09, 2024 |
Lei Shao | 1 | Inferencing with Grok-1 on AMD GPUs August 09, 2024 | Inferencing with Grok-1 on AMD GPUs August 09, 2024 |
Jeremy Arnold | 1 | Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission August 28, 2024 | Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission August 28, 2024 |
Jassani Adeem | 1 | Mamba on AMD GPUs with ROCm June 28, 2024 | Mamba on AMD GPUs with ROCm June 28, 2024 |
Moskvichev Arseny | 1 | Mamba on AMD GPUs with ROCm June 28, 2024 | Mamba on AMD GPUs with ROCm June 28, 2024 |
Karan Verma | 1 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025 |
Ean Garvey | 1 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025 |
Kumar Deepak | 1 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025 |
AMD Quark team | 1 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025 | Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission April 02, 2025 |
Rathnakara Malatesha | 1 | Deploying Serverless AI Inference on AMD GPU Clusters February 25, 2025 | Deploying Serverless AI Inference on AMD GPU Clusters February 25, 2025 |
Chaitanya Manem | 1 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 | Introducing Instella: New State-of-the-art Fully Open 3B Language Models March 05, 2025 |
Tiffany Mintz | 1 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025 |
Ted Themistokleous | 1 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025 |
Brian Pickrell | 1 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025 |
Vish Vadlamani | 1 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025 | Triton Inference Server with vLLM on AMD GPUs January 08, 2025 |
Eduardo Alvarez | 1 | Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance March 14, 2025 | Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance March 14, 2025 |
Yu Wang | 1 | AMD Advances Enterprise AI Through OPEA Integration March 12, 2025 | AMD Advances Enterprise AI Through OPEA Integration March 12, 2025 |
Aditya Bhattacharji | 1 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025 |
Marco Grond | 1 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025 |
Ronnie Chatterjee | 1 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025 | ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software April 11, 2025 |
Brian Cornille | 1 | Introducing AMD's Next-Gen Fortran Compiler November 13, 2024 | Introducing AMD's Next-Gen Fortran Compiler November 13, 2024 |
Michael Klemm | 1 | Introducing AMD's Next-Gen Fortran Compiler November 13, 2024 | Introducing AMD's Next-Gen Fortran Compiler November 13, 2024 |
Johanna Potyka | 1 | Introducing AMD's Next-Gen Fortran Compiler November 13, 2024 | Introducing AMD's Next-Gen Fortran Compiler November 13, 2024 |
Martin Huarte | 1 | Boosting Computational Fluid Dynamics Performance with AMD Instinct™ MI300X January 14, 2025 | Boosting Computational Fluid Dynamics Performance with AMD Instinct™ MI300X January 14, 2025 |
Quentin Anthony | 1 | Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators December 10, 2024 | Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators December 10, 2024 |
Danny Guan | 1 | ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver April 11, 2025 | ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver April 11, 2025 |
Nicholas Curtis | 1 | Register pressure in AMD CDNA™2 GPUs May 17, 2023 | Register pressure in AMD CDNA™2 GPUs May 17, 2023 |
Rajneesh Bhardwaj | 1 | Deep dive into the MI300 compute and memory partition modes February 09, 2025 | Deep dive into the MI300 compute and memory partition modes February 09, 2025 |
Anton Smirnov | 1 | Programming AMD GPUs with Julia April 16, 2024 | Programming AMD GPUs with Julia April 16, 2024 |
Mohammad Mahdi Kamani | 1 | Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs April 28, 2025 | Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs April 28, 2025 |
Parsa Fashi | 1 | Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs April 28, 2025 | Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs April 28, 2025 |
Vikram Appia | 1 | Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs April 28, 2025 | Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs April 28, 2025 |
David Doscher | 1 | AMD ROCm™ installation January 26, 2023 | AMD ROCm™ installation January 26, 2023 |
Rene Van Oostrum | 1 | AMD matrix cores November 14, 2022 | AMD matrix cores November 14, 2022 |
Nicholas Malaya | 1 | AMD matrix cores November 14, 2022 | AMD matrix cores November 14, 2022 |
David Li | 1 | Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs April 15, 2025 | Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs April 15, 2025 |
Yao Fehlis | 1 | Creating a PyTorch/TensorFlow code environment on AMD GPUs September 11, 2023 | Creating a PyTorch/TensorFlow code environment on AMD GPUs September 11, 2023 |
Ammar Elwazir | 1 | Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling March 25, 2025 | Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling March 25, 2025 |
Noah Wolfe | 1 | Introduction to profiling tools for AMD hardware April 12, 2023 | Introduction to profiling tools for AMD hardware April 12, 2023 |
Cheng Ling | 1 | SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel May 31, 2024 | SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel May 31, 2024 |
Pedram Alizadeh | 1 | Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X March 02, 2025 | Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X March 02, 2025 |
Gilbert Lee | 1 | Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X March 02, 2025 | Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X March 02, 2025 |
Lei Zhang | 1 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025 |
Fan Wu | 1 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025 |
Kyle Wang | 1 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025 | Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed May 06, 2025 |
Carlus Huang | 1 | AITER: AI Tensor Engine For ROCm March 21, 2025 | AITER: AI Tensor Engine For ROCm March 21, 2025 |
Lingpeng Jin | 1 | AITER: AI Tensor Engine For ROCm March 21, 2025 | AITER: AI Tensor Engine For ROCm March 21, 2025 |
Yao Fu | 1 | Optimized ROCm Docker for Distributed AI Training March 13, 2025 | Optimized ROCm Docker for Distributed AI Training March 13, 2025 |
Chang Liu | 1 | Speculative Decoding - Deep Dive March 24, 2025 | Speculative Decoding - Deep Dive March 24, 2025 |
Garrett Byrd | 1 | Installing ROCm from source with Spack April 14, 2025 | Installing ROCm from source with Spack April 14, 2025 |
Joseph Schoonover | 1 | Installing ROCm from source with Spack April 14, 2025 | Installing ROCm from source with Spack April 14, 2025 |
Mahdieh Ghazimirsaeed | 1 | GPU-aware MPI with ROCm June 08, 2023 | GPU-aware MPI with ROCm June 08, 2023 |
Evan Masters | 1 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025 |
Babak Poursartip | 1 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025 |
Henry Ho | 1 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025 | Measuring Max-Achievable FLOPs – Part 2 February 28, 2025 |
Corbin Robeck | 1 | Reading AMD GPU ISA May 13, 2024 | Reading AMD GPU ISA May 13, 2024 |
Alex Voicu | 1 | C++17 parallel algorithms and HIPSTDPAR # April 18, 2024 | C++17 parallel algorithms and HIPSTDPAR # April 18, 2024 |
Paul Mullowney | 1 | Sparse matrix vector multiplication - part 1 November 03, 2023 | Sparse matrix vector multiplication - part 1 November 03, 2023 |