Data Science - Software Tools & Optimizations#

Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs
Accelerate data science with ROCm-DS: AMD’s GPU-optimized toolkit for faster data frames and graph analytics using hipDF and hipGRAPH
May 20, 2025 by Marco Grond, Saad Rahim

Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools
Dive into kernel-level profiling of DeepseekV3 on SGLang—identify GPU bottlenecks and boost large language model performance using ROCm

Installing ROCm from source with Spack
Install ROCm and PyTorch from source using Spack. Learn how to optimize builds, manage dependencies, and streamline your GPU software stacks.
April 14, 2025 by Garrett Byrd, Joseph Schoonover

Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X
The blog explains the reasons behind RCCL bandwidth limitations and xGMI performance constraints, and provides actionable steps to maximize link efficiency on AMD MI300X
March 02, 2025 by Jayacharan Kolla, Pedram Alizadeh, Gilbert Lee

SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel
SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel
May 31, 2024 by Cheng Ling

Register pressure in AMD CDNA™2 GPUs
Register pressure
May 17, 2023 by Alessandro Fanfarillo, Nicholas Curtis

AMD matrix cores
Matrix cores
November 14, 2022 by Gina Sitaraman, Damon McDougall, Rene Van Oostrum, Nicholas Malaya, Noel Chalmers, Ossian O''Reilly
Prev
Page 1 of 1
Next