AI Blogs#
Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation
Nitro-E is an extremely lightweight diffusion transformer model for high-quality image generation with only 304M paramters.
STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm
STX-B0T explores the potential of RyzenAI PCs to power robotics applications on NPU+GPU. This blog demonstrates how our hardware and software interoperate to unlock real-time perception.
Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring
At AMD, the PyTorch ecosystem team is committed to delivering an exceptional out-of-the-box experience for developers. Over the past year, the team has made significant progress in expanding PyTorch ecosystem support, improving CI test coverage across a wider range of GPU architectures, enhancing training and inference capabilities, streamlining the developer experience, introducing new functionality and performance optimizations, and strengthening quality monitoring. This blog showcases our ongoing efforts to build a robust PyTorch ecosystem on AMD ROCm™ Software, including the production-readiness of PyTorch across N-1, N, and N+1 releases aligned with ROCm versions. We also introduce the AI SoftWare Heads-Up Dashboard (AISWHUD), a powerful new tool that provides deep insights into the health and performance of the PyTorch ecosystem on ROCm, empowering developers with greater visibility and control.
ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System
Introduce ROCm Core SDK, and learn to install and build ROCm components easily using TheRock.
ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity
Discover how ROCm 7.0 integrates AI across every layer, combining hardware enablement, frameworks, model support, and a suite of optimized tools
Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration
performance optimizations for llama.cpp on AMD Instinct GPUs
Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware
Day 0 support across our AI hardware ecosystem from our flagship AMD InstinctTM MI355X and MI300X GPUs, AMD Radeon™ AI PRO R700 GPUs and AMD Ryzen™ AI Processors
Unlocking GPU-Accelerated Containers with the AMD Container Toolkit
Simplify GPU acceleration in containers with the AMD Container Toolkit—streamlined setup, runtime hooks, and full ROCm integration.
Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs
Learn how AMD Instinct MI355 Series GPUs deliver competitive Kimi-K2 inference with faster TTFT, lower latency, and strong throughput.
Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs
Learn how to use Medical Open Network for Artificial Intelligence (MONAI) 1.0 on ROCm, with examples and demonstrations.
Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection
Learn how to setup, run and optimize SwinUNETR on AMD MI300X GPUs for fast medical imaging 3D segmentation of tumors using fast, large ROIs.
Optimizing FP4 Mixed-Precision Inference with Petit on AMD Instinct MI250 and MI300 GPUs: A Developer’s Perspective
Learn how FP4 mixed-precision on AMD GPUs boosts inference speed and integrates seamlessly with SGLang.
Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More
Gumiho boosts LLM inference with early-token accuracy, blending serial + parallel decoding for speed, accuracy, and ROCm-optimized deployment.
GEMM Tuning within hipBLASLt– Part 2
Learn how to use hipblaslt-bench for offline GEMM tuning in hipBLASLt—benchmark, save, and apply custom-tuned kernels at runtime.
GPU Partitioning Made Easy: Pack More AI Workloads Using AMD GPU Operator
What’s New in AMD GPU Operator: Learn About GPU Partitioning and New Kubernetes Features
Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture
This blog post explains how to use Matrix Cores on CDNA3 and CDNA4 architecture, with a focus on low-precision data types such as FP16, FP8, and FP4
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics