Developers Blogs#
Primus Projection: Estimate Memory and Performance Before You Train
Learn how to use the Primus projection tool to estimate memory and performance for large-scale LLM training on AMD Instinct™ accelerator platforms.
Getting Started with FlyDSL Nightly Wheels on ROCm
A practical guide to installing and using FlyDSL nightly wheels on ROCm for fast, Python-native GPU kernel development
Customizing Kernels with hipBLASLt TensileLite GEMM Tuning - Advanced User Guide
Master hipBLASLt TensileLite Tuning. Learn to build custom kernels that deliver 150%-250% faster GEMM performance on AMD Instinct™ MI300X GPUs
Training a Robotic Arm Using MuJoCo and JAX on AMD Hardware with ROCm™
A complete guide to training an RL-based pick-and-lift robotic arm in MuJoCo with JAX, running on AMD hardware via ROCm.
Building Robotics Applications with Ryzen AI and ROS 2
This blog post gives a walkthrough of how to deploy a robotics application on the AI PC integrated with ROS - the robot operating system. We showcase Ryzen AI CVML Library to do perception tasks like depth estimation and develop a custom ROS 2 node which allows easy integration with the ROS ecosystem and standard components.
Continuing the Momentum: Refining ROCm For The Next Wave Of AI and HPC
ROCm 7.1 builds on 7.0’s AI and HPC advances with faster performance, stronger reliability, and streamlined tools for developers and system builders.
ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity
Discover how ROCm 7.0 integrates AI across every layer, combining hardware enablement, frameworks, model support, and a suite of optimized tools
Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware
Day 0 support across our AI hardware ecosystem from our flagship AMD InstinctTM MI355X and MI300X GPUs, AMD Radeon™ AI PRO R700 GPUs and AMD Ryzen™ AI Processors
Edge-to-Cloud Robotics with AMD ROCm: From Data Collection to Real-Time Inference
This blog demonstrates a comprehensive Edge-to-Cloud robotics AI solution powered by the AMD ecosystem and the Hugging Face LeRobot framework.
hipBLASLt Online GEMM Tuning
Learn how to improve model performance with hipBLASLt online tuning merged into LLM framework
GROMACS Performance on AMD Instinct MI355X
Explore GROMACS molecular dynamics performance benchmarks on AMD Instinct MI355X GPUs with HIP acceleration.
Getting Started with ComfyUI on AMD Radeon™ RX 9000 Series GPUs
Learn how to set up and optimize ComfyUI on AMD Radeon RX 9000 GPUs with ROCm 7.1 — solve common issues and start generating.
Agentic Diagnosis for LLM Training at Scale
Explore how AI agents diagnose LLM training incidents — from RCCL hangs to throughput regressions — in one prompt with MaxText-Slurm.
MaxText-Slurm: Production-Grade LLM Training with Built-In Observability
MaxText-Slurm: A unified launch system for production-grade LLM training with observability on AMD GPU clusters.
Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation
Learn how to use our flexible and scalable pipeline parallelism framework with Primus backend and AMD hardware.
FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs
FlyDSL is a Python-first, MLIR-native DSL for expert GPU kernel development and tuning on AMD GPUs.
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics