Featured Posts
Micro-World: First AMD Open-Source World Models for Interactive Video Generation
Micro-World is an action-controlled interactive world model designed to generate high-quality, open-domain scenes.
ROCm 7.2: Smarter, Faster, and More Scalable for Modern AI Workloads
we highlight the latest ROCm 7.2 enhancements for AMD Instinct GPUs, designed to boost AI and HPC performance
Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms
Learn how to use Hummingbird-XT and Hummingbird-XTX modelS to generate videos. Explore the video diffusion model acceleration solution, including dit distillation method and light VAE model.
Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs
Explore how MI355X performs against B200 in vLLM benchmarks across DeepSeek-R1, GPT-OSS-120B, Qwen3-235B and Llama-3.3-70B.
Unlocking Sparse Acceleration on AMD GPUs with hipSPARSELt
This blog post introduces semi-structured sparsity technology supported on AMD systems and explains how to use the corresponding library to leverage its benefit.
Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression
Showcase advanced algorithms available in AMD Quark for efficient MXFP4 quantization on AMD Instinct accelerators with high accuracy retention.
Adaptive Top-K Selection: Eliminating Performance Cliffs Across All K Values on AMD GPUs
Explore adaptive Top-K on MI300X! See how auto-selection and hardware optimizations like DPP and double buffering drive peak efficiency.
Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot
Learn how to use multi-node and multi-cluster autoscaling in the Ray framework on ROCm 7.0.0 with SkyPilot
Building Robotics Applications with Ryzen AI and ROS 2
This blog post gives a walkthrough of how to deploy a robotics application on the AI PC integrated with ROS - the robot operating system. We showcase Ryzen AI CVML Library to do perception tasks like depth estimation and develop a custom ROS 2 node which allows easy integration with the ROS ecosystem and standard components.
Quickly Developing Powerful Flash Attention Using TileLang on AMD Instinct MI300X GPU
Learn how to leverage TileLang to develop your own kernel. Explore the power to fully utilize AMD GPUs
Accelerating llama.cpp on AMD Instinct MI300X
Learn more about the superior performance of llama.cpp on Instinct platforms.
Democratizing AI Compute with AMD Using SkyPilot
Learn how SkyPilot integrates with AMD open AI stack to enable seamless multi-cloud deployment and simplifies NVIDIA-to-AMD GPU migration.
Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0
Deploy verl on AMD GPUs for fast, scalable RLHF training with ROCm optimization, Docker scripts, and strong throughput and convergence results
Solution Blueprints: Accelerating AI Deployment with AMD Enterprise AI
This blog presents AIMs Solution Blueprints and demonstrates modular, Helm‑based deployment patterns.
Digital Twins on AMD: Building Robotic Simulations Using Edge AI PCs
Explore how Ryzen AI MAX enables robotic simulation on a single AI PC and take your first step into digital twins.
Resilient Large-Scale Training: Integrating TorchFT with TorchTitan on AMD GPUs
Achieve resilient, checkpoint-less distributed training on AMD GPUs by integrating TorchFT with TorchTitan on Primus-SaFE.
Debugging NaN Results in CK Tile GEMM: A rocgdb Detective Story
Learn GPU kernel debugging with rocgdb through a real case: tracing NaN outputs to a one-character typo in CK Tile GEMM
LLM Inference Optimization Using AMD GPU Partitioning
Demonstrate how to leverage compute and memory partitioning features in ROCm to scale model serving.
ROCm Becomes a First-Class Platform in the vLLM Ecosystem
ROCm is now a first-class vLLM platform: official wheels + Docker, stronger CI, and faster LLM & multimodal inference on AMD Instinct GPUs.
Deep Dive into Primus: High-Performance Training for Large Language Models
Learn how to achieve peak dense LLM training performance on AMD Instinct™ GPUs using Primus’s unified CLI and optimized backend presets.
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics
- View the ROCm Developer Hub
- Report an issue or request a feature
- We are eager to learn from our community! If you would like to contribute to the ROCm Blogs, please submit your technical blog for review at our GitHub. Blog creation can be started through our GitHub user guide.