HPC - Software Tools & Optimizations#
FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs
FlyDSL is a Python-first, MLIR-native DSL for expert GPU kernel development and tuning on AMD GPUs.
Introducing hipThreads: A C++ - Style Concurrency Library for AMD GPUs
Discover how hipThreads lets you write hip::thread just like std::thread and unlock GPU acceleration with minimal code changes.
ROCm 7.2: Smarter, Faster, and More Scalable for Modern AI Workloads
we highlight the latest ROCm 7.2 enhancements for AMD Instinct GPUs, designed to boost AI and HPC performance
Introducing the AMD Network Operator v1.0.0: Simplifying High-Performance Networking for AMD Platforms
Introducing the AMD Network Operator for automating high-performance AI NIC networking in Kubernetes for AI and HPC workloads
Performance Profiling on AMD GPUs - Part 3: Advanced Usage
Part 3 of our GPU profiling series guides beginners through practical steps to identify and optimize kernel bottlenecks using ROCm tools
ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System
Introduce ROCm Core SDK, and learn to install and build ROCm components easily using TheRock.
GPU Partitioning Made Easy: Pack More AI Workloads Using AMD GPU Operator
What’s New in AMD GPU Operator: Learn About GPU Partitioning and New Kubernetes Features
Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture
This blog post explains how to use Matrix Cores on CDNA3 and CDNA4 architecture, with a focus on low-precision data types such as FP16, FP8, and FP4
Performance Profiling on AMD GPUs – Part 2: Basic Usage
Part 2 of our GPU profiling series guides beginners through practical steps to identify and optimize kernel bottlenecks using ROCm tools
Performance Profiling on AMD GPUs – Part 1: Foundations
Part 1 of our GPU profiling series introduces ROCm tools, setup steps, and key concepts to prepare you for deeper dives in the posts to follow.
Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs
Accelerate data science with ROCm-DS: AMD’s GPU-optimized toolkit for faster data frames and graph analytics using hipDF and hipGRAPH
Installing ROCm from source with Spack
Install ROCm and PyTorch from source using Spack. Learn how to optimize builds, manage dependencies, and streamline your GPU software stacks.