Developers - Software Tools & Optimizations - Page 2#
GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
AMD introduces GEAK, an AI agent for generating optimized Triton GPU kernels, achieving up to 63% accuracy and up to 2.59× speedups on MI300X GPUs.
ROCm Runfile Installer Is Here!
Overview of ROCm Runfile Installer introduced in ROCm 6.4, allowing a complete single package for driver and ROCm installation without internet connectivity
From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile
Learn how to implement FlashAttention-v2 with CK-Tile: minimize memory overhead, maximize compute efficiency, and scale on AMD GPUs
Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs
Accelerate data science with ROCm-DS: AMD’s GPU-optimized toolkit for faster data frames and graph analytics using hipDF and hipGRAPH
Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed
Unlock the full power of AMD GPUs—write portable, efficient kernels with Triton-Distributed, overlapping computation and communication with ease and flexibility
C++17 parallel algorithms and HIPSTDPAR #
C++17 parallel algorithms and HIPSTDPAR
Register pressure in AMD CDNA™2 GPUs
Register pressure
AMD matrix cores
Matrix cores