Developers Blogs#
Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs
Explore how MI355X performs against B200 in vLLM benchmarks across DeepSeek-R1, GPT-OSS-120B, Qwen3-235B and Llama-3.3-70B.
The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism
Learn how to combine TP, DP, PP, and EP for MoE models. Discover proven strategies to maximize performance on your vLLM deployments.
Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script
Learn how to improve model performance with hipBLASLt offline tuning in our easy-to-use Day 0 tool for developers to optimize GEMM efficiency
Continuing the Momentum: Refining ROCm For The Next Wave Of AI and HPC
ROCm 7.1 builds on 7.0’s AI and HPC advances with faster performance, stronger reliability, and streamlined tools for developers and system builders.
ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity
Discover how ROCm 7.0 integrates AI across every layer, combining hardware enablement, frameworks, model support, and a suite of optimized tools
Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware
Day 0 support across our AI hardware ecosystem from our flagship AMD InstinctTM MI355X and MI300X GPUs, AMD Radeon™ AI PRO R700 GPUs and AMD Ryzen™ AI Processors
Unlocking GPU-Accelerated Containers with the AMD Container Toolkit
Simplify GPU acceleration in containers with the AMD Container Toolkit—streamlined setup, runtime hooks, and full ROCm integration.
ROCm Revisited: Getting Started with HIP
New to HIP? This blog will introduce you to the HIP runtime API, its key concepts and installation and practical code examples to showcase its functionality.
Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation
Nitro-E is an extremely lightweight diffusion transformer model for high-quality image generation with only 304M paramters.
STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm
STX-B0T explores the potential of RyzenAI PCs to power robotics applications on NPU+GPU. This blog demonstrates how our hardware and software interoperate to unlock real-time perception.
Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring
Production ROCm support for N-1 to N+1 PyTorch releases is in progress. The AI Software Head-Up Dashboard shows status of PyTorch on ROCm.
Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection
Learn how to setup, run and optimize SwinUNETR on AMD MI300X GPUs for fast medical imaging 3D segmentation of tumors using fast, large ROIs.
Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training
Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.
ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System
Introduce ROCm Core SDK, and learn to install and build ROCm components easily using TheRock.
GEMM Tuning within hipBLASLt– Part 2
Learn how to use hipblaslt-bench for offline GEMM tuning in hipBLASLt—benchmark, save, and apply custom-tuned kernels at runtime.
Elevating 3D Scene Rendering with GSplat
ROCm Port of GSplat - GPU accelerated rasterization of Gaussian splatting
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics