AI Blogs - Page 4#
High-Resolution Weather Forecasting with StormCast on AMD Instinct GPU Accelerators
A showcase for how to run high-resolution weather prediction models such as StormCast on AMD Instinct hardware.
ROCm Fork of MaxText: Structure and Strategy
Learn how the ROCm fork of MaxText mirrors upstream while enabling offline testing, minimal datasets, and platform-agnostic, decoupled workflows.
ROCm MaxText Testing — Decoupled (Offline) and Cloud-Integrated Modes
Learn how to run MaxText unit tests on AMD ROCm GPUs in offline and cloud modes for fast validation, clear reports, and reproducible workflows.
Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models
Learn how to optimize multimodal model inference with batch-level data parallelism for vision encoders in vLLM, achieving up to 45% throughput gains on AMD MI300X.
SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning
In this blog we will discuss SparK, a training-free, plug-and-play method for KV cache compression in large language models (LLMs).
GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs
Introducing GEAK Family - AI-driven agents that automate GPU kernel optimization for AMD Instinct GPUs with hardware-aware feedback
Getting Started with AMD AI Workbench: Deploying and Managing AI Workloads
Learn how to deploy and manage AI workloads with AMD AI Workbench, a low-code interface for developers to manage AI inference deployments
A Step-by-Step Walkthrough of Decentralized LLM Training on AMD GPUs
Learn how to train LLMs across decentralized clusters on AMD Instinct MI300 GPUs with DiLoCo and Prime—scale beyond one datacenter.
MoE Training Best Practices on AMD GPUs
Learn how to optimize Mixture-of-Experts (MoE) model training on AMD Instinct GPUs with ROCm. Maximize your AI training performance now!
Accelerating llama.cpp on AMD Instinct MI300X
Learn more about the superior performance of llama.cpp on Instinct platforms.
Medical Imaging on MI300X: SwinUNETR Inference Optimization
A practical guide to optimizing SwinUNETR inference on AMD Instinct™ MI300X GPUs for fast 3D segmentation of tumors in medical imaging.
Accelerating Autonomous Driving Model Training on AMD ROCm™ Software
Learn how to deploy AMD GPUs for high-performance autonomous driving related model training with ROCm optimization.