Developers - Software Tools & Optimizations#

ROCm Runfile Installer Is Here!
Overview of ROCm Runfile Installer introduced in ROCm 6.4, allowing a complete single package for driver and ROCm installation without internet connectivity
May 22, 2025 by Douglas Hamilton, Saad Rahim, Liam Berry

From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile
Learn how to implement FlashAttention-v2 with CK-Tile: minimize memory overhead, maximize compute efficiency, and scale on AMD GPUs
May 21, 2025 by Haocong Wang, Kevin Chang, David Li, George Wang

Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs
Accelerate data science with ROCm-DS: AMD’s GPU-optimized toolkit for faster data frames and graph analytics using hipDF and hipGRAPH
May 20, 2025 by Marco Grond, Saad Rahim

Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed
Unlock the full power of AMD GPUs—write portable, efficient kernels with Triton-Distributed, overlapping computation and communication with ease and flexibility

C++17 parallel algorithms and HIPSTDPAR #
C++17 parallel algorithms and HIPSTDPAR
April 18, 2024 by Alessandro Fanfarillo, Alex Voicu

Register pressure in AMD CDNA™2 GPUs
Register pressure
May 17, 2023 by Alessandro Fanfarillo, Nicholas Curtis

AMD matrix cores
Matrix cores
November 14, 2022 by Gina Sitaraman, Damon McDougall, Rene Van Oostrum, Nicholas Malaya, Noel Chalmers, Ossian O''Reilly
Prev
Page 1 of 1
Next