HPC Blogs#
Performance Profiling on AMD GPUs - Part 4: Fortran OpenMP Offload Edition
Guides developers through profiling and optimizing Fortran OpenMP GPU offload applications using ROCm tools
Running Variational Quantum Eigensolver with Qiskit Aer on AMD Instinct
A step-by-step guide to running GPU-accelerated VQE for quantum chemistry with Qiskit Aer on AMD Instinct using ROCm.
Deep Dive Into 4-Wave Interleave FP8 GEMM
Learn how to build faster FP8 GEMM kernels on AMD CDNA™4 using 4-wave interleaving to hide memory latency and maximize Matrix Core utilization.
From Naive to Near-Peak: Building High-Performance GEMM Kernels with Gluon
Learn how a Gluon GEMM tutorial teaches profiling-driven AMD GPU optimization from FP16 baseline to BF8 and MXFP4 kernels.
Styled Text Image Generation with Eruku on AMD
Hands-on, reproducible guide to train and run Eruku on LUMI supercomputer, powered by AMD Instinct MI250X GPUs.
Continuing the Momentum: Refining ROCm For The Next Wave Of AI and HPC
ROCm 7.1 builds on 7.0’s AI and HPC advances with faster performance, stronger reliability, and streamlined tools for developers and system builders.
ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity
Discover how ROCm 7.0 integrates AI across every layer, combining hardware enablement, frameworks, model support, and a suite of optimized tools
Unlocking GPU-Accelerated Containers with the AMD Container Toolkit
Simplify GPU acceleration in containers with the AMD Container Toolkit—streamlined setup, runtime hooks, and full ROCm integration.
Programming Tensor Descriptors in Composable Kernel (CK)
Learn how to use TensorDescriptor in Composable Kernel (CK) to manage multi-dimensional data layouts and write efficient GPU kernels on AMD GPUs.
GROMACS on AMD Instinct GPUs: A Complete Build Guide
Build GROMACS with HIP, UCX, and OpenMPI on AMD MI300X/MI355X — covering bare metal, Apptainer, and Docker deployments.
GROMACS Performance on AMD Instinct MI355X
Explore GROMACS molecular dynamics performance benchmarks on AMD Instinct MI355X GPUs with HIP acceleration.
HPC Coding Agent - Part 3: MCP Tool for Profiling
Build an AI agent specialized in optimizing HPC workloads by connecting a Cline agent to expert-level AMD profiling tools via a custom MCP server.
TraceLens: Democratizing AI Performance Analysis
Explore how TraceLens automates profiler trace analysis to pinpoint bottlenecks and optimize AI workloads.
Getting Started with FlyDSL Nightly Wheels on ROCm
A practical guide to installing and using FlyDSL nightly wheels on ROCm for fast, Python-native GPU kernel development
FP8 GEMM Optimization on AMD CDNA™4 Architecture
Learn how to build high-performance FP8 GEMM kernels on AMD CDNA™4 GPUs using MFMA, LDS swizzling, and double-buffering.
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics