HPC - Software Tools & Optimizations

HPC - Software Tools & Optimizations#

July 20, 2026

Understanding Attention Algorithms and Their Backends for Image and Video Generation

Practical guide to attention backends in ComfyUI on AMD describing how to optimize performance, memory, and stability with the right configuration.

./software-tools-optimization/comfyui-fa-backends/README.html

July 20, 2026

SPIR-V on ROCm: A Portable IR for AMD GPUs

Learn how SPIR-V brings compile-once, specialize-on-device portability to AMD GPUs — with a reproducible HIP benchmark, trade-off analysis, and quick-start guide.

./software-tools-optimization/spir-v-rocm/README.html

July 16, 2026

Performance Profiling on AMD GPUs – Part 5: Profiling-Driven Kernel Optimization with an AI Code-Assist Tool

Ready to slash HIP kernel runtimes? See how ROCm profiling + an AI code-assist agent delivered a 28.3× speedup on AMD Instinct MI250.

./software-tools-optimization/profiling-guide/ai-assist-optimization/README.html

July 09, 2026

Porting High-Performance HIP Kernels to FlyDSL

This blog post shows how to port HIP C++ GPU kernels to FlyDSL, AMD's new Python DSL, matching hand-tuned C++ performance with less code.

./software-tools-optimization/porting-hip-flydsl/README.html

June 29, 2026

OpenXLA and JAX - ROCm Support and the State of CI

Learn how OpenXLA and JAX run on AMD ROCm: what landed this year, how every PR is gated on real Instinct hardware, and how to get started.

./software-tools-optimization/openxla-jax-rocm/README.html

June 01, 2026

Performance Profiling on AMD GPUs - Part 4: Fortran OpenMP Offload Edition

Guides developers through profiling and optimizing Fortran OpenMP GPU offload applications using ROCm tools

./software-tools-optimization/profiling-guide/fortran_openmp/README.html

May 27, 2026

Deep Dive Into 4-Wave Interleave FP8 GEMM

Learn how to build faster FP8 GEMM kernels on AMD CDNA™4 using 4-wave interleaving to hide memory latency and maximize Matrix Core utilization.

./software-tools-optimization/4wave-fp8gemm/README.html

May 22, 2026

From Naive to Near-Peak: Building High-Performance GEMM Kernels with Gluon

Learn how a Gluon GEMM tutorial teaches profiling-driven AMD GPU optimization from FP16 baseline to BF8 and MXFP4 kernels.

./software-tools-optimization/gluon-gemm-tutorial/README.html

April 27, 2026

TraceLens: Democratizing AI Performance Analysis

Explore how TraceLens automates profiler trace analysis to pinpoint bottlenecks and optimize AI workloads.

./software-tools-optimization/tracelens/README.html

April 20, 2026

Getting Started with FlyDSL Nightly Wheels on ROCm

A practical guide to installing and using FlyDSL nightly wheels on ROCm for fast, Python-native GPU kernel development

./software-tools-optimization/flydsl-nightly-wheel/README.html

April 10, 2026

Introduction to profiling tools for AMD hardware

Profiling tools

./software-tools-optimization/profilers/README.html

March 10, 2026

FP8 GEMM Optimization on AMD CDNA™4 Architecture

Learn how to build high-performance FP8 GEMM kernels on AMD CDNA™4 GPUs using MFMA, LDS swizzling, and double-buffering.

./software-tools-optimization/cdna4-gemm-kernels/README.html

Prev Page 1 of 4 Next