HPC - Software Tools & Optimizations - Page 2

HPC - Software Tools & Optimizations - Page 2#

March 09, 2026

Agentic Diagnosis for LLM Training at Scale

Explore how AI agents diagnose LLM training incidents — from RCCL hangs to throughput regressions — in one prompt with MaxText-Slurm.

./software-tools-optimization/maxtext-slurm-agentic-diagnosis/README.html

March 02, 2026

MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

MaxText-Slurm: A unified launch system for production-grade LLM training with observability on AMD GPU clusters.

./software-tools-optimization/maxtext-slurm/README.html

February 20, 2026

FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs

FlyDSL is a Python-first, MLIR-native DSL for expert GPU kernel development and tuning on AMD GPUs.

./software-tools-optimization/flydsl-python-native/README.html

February 19, 2026

Introducing hipThreads: A C++ - Style Concurrency Library for AMD GPUs

Discover how hipThreads lets you write hip::thread just like std::thread and unlock GPU acceleration with minimal code changes.

./software-tools-optimization/hipthreads-introduction/README.html

January 22, 2026

ROCm 7.2: Smarter, Faster, and More Scalable for Modern AI Workloads

we highlight the latest ROCm 7.2 enhancements for AMD Instinct GPUs, designed to boost AI and HPC performance

./software-tools-optimization/rocm7.2/README.html

January 08, 2026

Introducing the AMD Network Operator v1.0.0: Simplifying High-Performance Networking for AMD Platforms

Introducing the AMD Network Operator for automating high-performance AI NIC networking in Kubernetes for AI and HPC workloads

./software-tools-optimization/amd-network-operator/README.html

October 23, 2025

Performance Profiling on AMD GPUs - Part 3: Advanced Usage

Part 3 of our GPU profiling series guides beginners through practical steps to identify and optimize kernel bottlenecks using ROCm tools

./software-tools-optimization/profiling-guide/advanced/README.html

October 20, 2025

ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System

Introduce ROCm Core SDK, and learn to install and build ROCm components easily using TheRock.

./software-tools-optimization/therock/README.html

October 01, 2025

GPU Partitioning Made Easy: Pack More AI Workloads Using AMD GPU Operator

What’s New in AMD GPU Operator: Learn About GPU Partitioning and New Kubernetes Features

./software-tools-optimization/gpu-operator-partitioning/README.html

September 30, 2025

Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture

This blog post explains how to use Matrix Cores on CDNA3 and CDNA4 architecture, with a focus on low-precision data types such as FP16, FP8, and FP4

./software-tools-optimization/matrix-cores-cdna/README.html

August 13, 2025

Performance Profiling on AMD GPUs – Part 2: Basic Usage

Part 2 of our GPU profiling series guides beginners through practical steps to identify and optimize kernel bottlenecks using ROCm tools

./software-tools-optimization/profiling-guide/novice/README.html

June 26, 2025

Performance Profiling on AMD GPUs – Part 1: Foundations

Part 1 of our GPU profiling series introduces ROCm tools, setup steps, and key concepts to prepare you for deeper dives in the posts to follow.

./software-tools-optimization/profiling-guide/intro/README.html

Prev Page 2 of 4 Next