Featured Posts
Reproducing the AMD MLPerf Inference v6.0 Submission Result
Provide instructions to potential customers and partners to verify our MLPerf Inference v6.0 submission result.
Utilizing AMD Instinct GPU Accelerators for Weather and Precipitation Forecasting with NeuralGCM
A showcase of how to run NeuralGCM, a hybrid GCM model, on AMD Instinct hardware, including an introduction, installation, inference, and plotting.
FP8 GEMM Optimization on AMD CDNA™4 Architecture
Learn how to build high-performance FP8 GEMM kernels on AMD CDNA™4 GPUs using MFMA, LDS swizzling, and double-buffering.
ROCm 7.2: Smarter, Faster, and More Scalable for Modern AI Workloads
we highlight the latest ROCm 7.2 enhancements for AMD Instinct GPUs, designed to boost AI and HPC performance
vLLM-ATOM: Unlocking Native AMD Performance in the vLLM Ecosystem
Use ATOM as an out-of-tree vLLM plugin to keep vLLM compatibility while enabling AMD-optimized attention, model execution, and multi-model support including Kimi-K2.5.
AMD-Powered 3D Gaussian Splatting for Autonomous Driving Scenes
Run Street Gaussians on AMD Instinct MI300: migrate to latest gsplat, install on ROCm, and render dynamic street scenes.
Accelerating Mixture-of-Experts Execution with FarSkip-Collective Models
Explore a new MoE architecture designed for native computation-communication overlap, enabling efficient distributed execution.
TraceLens: Democratizing AI Performance Analysis
Explore how TraceLens automates profiler trace analysis to pinpoint bottlenecks and optimize AI workloads.
Styled Text Image Generation with Eruku on AMD
Hands-on, reproducible guide to train and run Eruku on LUMI supercomputer, powered by AMD Instinct MI250X GPUs.
Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot
Learn how to use multi-node and multi-cluster autoscaling in the Ray framework on ROCm 7.0.0 with SkyPilot
Building Robotics Applications with Ryzen AI and ROS 2
This blog post gives a walkthrough of how to deploy a robotics application on the AI PC integrated with ROS - the robot operating system. We showcase Ryzen AI CVML Library to do perception tasks like depth estimation and develop a custom ROS 2 node which allows easy integration with the ROS ecosystem and standard components.
Quickly Developing Powerful Flash Attention Using TileLang on AMD Instinct MI300X GPU
Learn how to leverage TileLang to develop your own kernel. Explore the power to fully utilize AMD GPUs
FLy: A New Paradigm for Speculative Decoding — Accepting Semantically Correct Drafts Beyond Exact Match
This blog explores a new training-free loosely speculative decoding method, that can accept mismatches that are semantically valid and speedup original SPD method.
Serving CTR Recommendation Models with Triton Inference Server using the ONNX Runtime Backend
Learn how to deploy AI models on AMD GPUs with Triton Inference Server, now supporting ONNX Runtime and Python backends, and see performance benchmarks.
FlashInfer on ROCm: High‑Throughput Prefill Attention via AITER
FlashInfer is an open-source library for accelerating LLM serving that is now supported by ROCm.
Customizing Kernels with hipBLASLt TensileLite GEMM Tuning - Advanced User Guide
Master hipBLASLt TensileLite Tuning. Learn to build custom kernels that deliver 150%-250% faster GEMM performance on AMD Instinct™ MI300X GPUs
Primus Projection: Estimate Memory and Performance Before You Train
Learn how to use the Primus projection tool to estimate memory and performance for large-scale LLM training on AMD Instinct™ accelerator platforms.
Getting Started with FlyDSL Nightly Wheels on ROCm
A practical guide to installing and using FlyDSL nightly wheels on ROCm for fast, Python-native GPU kernel development
Leveraging AMD AI Workbench to Scale LLM Inference for Optimal Resource Utilization
Learn how to use the AMD AI Workbench GUI and AIM Engine CLI capabilities to enable and configure autoscaling for your AI workloads.
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics
- View the ROCm Developer Hub
- Report an issue or request a feature
- We are eager to learn from our community! If you would like to contribute to the ROCm Blogs, please submit your technical blog for review at our GitHub. Blog creation can be started through our GitHub user guide.