Applications & models#
Explore the latest blogs about applications and models in the ROCm ecosystem, including machine learning frameworks, AI models, and application case studies.
MXFP6 and MXFP4 Mixed Precision for Accelerating Dense LLMs on AMD Instinct MI355X
W_MXFP4_A_MXFP6 quantization on AMD Instinct MI355X improves LLM throughput and latency while recovering accuracy versus MXFP4.
Faster Kimi-K2.5-W4A8 Decoding with EAGLE3 on AMD Instinct™ MI325X
Add EAGLE3 speculative decoding and three MoE/FMHA kernel-tuning patches to Kimi-K2.5-W4A8 inference on AMD Instinct™ MI325X with SGLang, AITER, and FlyDSL.
A Practical Guide to Running LLMs on AMD Radeon™ GPUs
This guide describes how to run LLMs on AMD Radeon™ GPUs using a range of partner frameworks, tools, and runtimes, with step-by-step setup instructions and performance optimization tips.
Comparative Analysis of Scale-Out RoCE Network Traffic Patterns and Loads in Training Large Language Models
Compares RoCE network traffic patterns and loads across GPT-4, Llama 3, DeepSeek-V2, and Grok 4.0 LLM training to guide AI infrastructure design.
Efficient and Portable 3D Explorable World Generation on AMD GPUs
Learn how to run Matrix3D world generation on AMD GPUs more smoothly and efficiently.
Utilizing AMD Schola and UnrealRoboticsLab with AMD ROCm™ Software to Train a Robotic Arm
Learn how to combine MuJoCo physics, Unreal Engine, and Schola to train a 6-DOF robot arm with reinforcement learning on AMD hardware.
Technical Dive into AMD's MLPerf Training v6.0 Submission
In this blog, we share the technical details of how we accomplish the results in our MLPerf Training v6.0 submission.
Reproducing AMD MLPerf Training v6.0 Submission Result
Learn how to reproduce AMD's MLPerf Training v6.0 submission result.
Low Kruskal-Rank Adaptation
Learn how Kruskal rank can enhance LoRA by replacing the conventional matrix-rank formulation for more efficient training.
Productionizing TurboQuant on AMD GPUs for KV-Cache-Bound LLM Inference
Productionized TurboQuant 4-bit KV-cache quantization on AMD GPUs via vLLM, with custom kernels and accuracy analysis on agentic workloads.
ORBIT-2 based Weather and Climate Downscaling and Downscaled Global Forecasts on AMD Instinct
A showcase for how to run GenCast’s weather prediction with ORBIT-2’s high-resolution downscaling on AMD Instinct hardware.
Out-of-the-Box ROLL Support on AMD GPUs: Accelerating Reinforcement Learning at Scale
Learn how to run Alibaba's ROLL RL framework out-of-the-box on AMD Instinct™ GPUs with ROCm