AI Blogs#
Adapting AIM LLMs For Specific Use Cases Through Fine-Tuning in AMD AI Workbench
Learn how to adapt and fine-tune an AIM LLM in AMD AI Workbench GUI for specialization or specific use cases.
Out-of-the-Box ROLL Support on AMD GPUs: Accelerating Reinforcement Learning at Scale
Learn how to run Alibaba's ROLL RL framework out-of-the-box on AMD Instinct™ GPUs with ROCm
Enabling Speculative Speculative Decoding on MI300X
This is an introduction of speculative speculative decoding method. We enable this method on the AMD Instinct MI300x GPUs and report the results.
AI Inference on AMD Ryzen™ AI Max Processor
Hands-on: run Qwen3.5 9B–122B on Ryzen™ AI Max+ with 128GB UMA and Ollama, with generation benchmarks and a clear UMA setup path on Ubuntu/ROCm.
Styled Text Image Generation with Eruku on AMD
Hands-on, reproducible guide to train and run Eruku on LUMI supercomputer, powered by AMD Instinct MI250X GPUs.
Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot
Learn how to use multi-node and multi-cluster autoscaling in the Ray framework on ROCm 7.0.0 with SkyPilot
Building Robotics Applications with Ryzen AI and ROS 2
This blog post gives a walkthrough of how to deploy a robotics application on the AI PC integrated with ROS - the robot operating system. We showcase Ryzen AI CVML Library to do perception tasks like depth estimation and develop a custom ROS 2 node which allows easy integration with the ROS ecosystem and standard components.
Quickly Developing Powerful Flash Attention Using TileLang on AMD Instinct MI300X GPU
Learn how to leverage TileLang to develop your own kernel. Explore the power to fully utilize AMD GPUs
Diffusion-based Atmospheric Downscaling on AMD Instinct GPUs
Read this blog post to learn about and understand the theory of downscaling models. Also learn how to run a particular model, CorrDiff, on AMD GPUs.
QuickReduce FP4 Quantization and Benchmarking on MI355
Learn how QuickReduce uses FP4 quantization to accelerate all-reduce communication and evaluate its performance on AMD Instinct MI355 GPUs.
Semantic Fencing of Video Streams Using Embedding Splits from Vision Foundation Models
Learn how to semantically split vision datasets using foundation model embeddings on AMD GPUs to reduce leakage and improve evaluation.
Further Accelerating Kimi-K2.5 on AMD Instinct™ MI325X: W4A8 & W8A8 Quantization with AMD Quark
Quantize Kimi-K2.5 to W4A8 and W8A8 using AMD Quark and serve on MI325X with FlyDSL and AITER for further inference acceleration.
From Build to Benchmark: ONNX Model Serving with Triton Inference Server on AMD GPUs
Step-by-step guide to building, deploying, and benchmarking ONNX models with Triton Inference Server and MIGraphX on AMD GPUs
vLLM-ATOM: Unlocking Native AMD Performance in the vLLM Ecosystem
Use ATOM as an out-of-tree vLLM plugin to keep vLLM compatibility while enabling AMD-optimized attention, model execution, and multi-model support including Kimi-K2.5.
TraceLens: Democratizing AI Performance Analysis
Explore how TraceLens automates profiler trace analysis to pinpoint bottlenecks and optimize AI workloads.
Primus Projection: Estimate Memory and Performance Before You Train
Learn how to use the Primus projection tool to estimate memory and performance for large-scale LLM training on AMD Instinct™ accelerator platforms.
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics