AI - Applications & Models - Page 4#
ROCm MaxText Testing — Decoupled (Offline) and Cloud-Integrated Modes
Learn how to run MaxText unit tests on AMD ROCm GPUs in offline and cloud modes for fast validation, clear reports, and reproducible workflows.
SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning
In this blog we will discuss SparK, a training-free, plug-and-play method for KV cache compression in large language models (LLMs).
GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs
Introducing GEAK Family - AI-driven agents that automate GPU kernel optimization for AMD Instinct GPUs with hardware-aware feedback
A Step-by-Step Walkthrough of Decentralized LLM Training on AMD GPUs
Learn how to train LLMs across decentralized clusters on AMD Instinct MI300 GPUs with DiLoCo and Prime—scale beyond one datacenter.
Medical Imaging on MI300X: SwinUNETR Inference Optimization
A practical guide to optimizing SwinUNETR inference on AMD Instinct™ MI300X GPUs for fast 3D segmentation of tumors in medical imaging.
Accelerating Autonomous Driving Model Training on AMD ROCm™ Software
Learn how to deploy AMD GPUs for high-performance autonomous driving related model training with ROCm optimization.
Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs
Explore how MI355X performs against B200 in vLLM benchmarks across DeepSeek-R1, GPT-OSS-120B, Qwen3-235B and Llama-3.3-70B.
Building a State-of-the-Art 32 Billion Reasoning Model with Only Synthetic Data on AMD GPUs
Learn how to build a State-of-the-art reasoning model that beats Qwen3-32B using only synthetic data and SFT on AMD Instinct™ GPUs—fast, simple, and scalable.
DGL in Depth: SE(3)-Transformer on ROCm 7
Inform the AI community about running SE(3)-Transformer with DGL on AMD Instinct platforms.
Modernizing Taichi Lang to LLVM 20 for MI355X GPU Acceleration
Power your next AI application or graphics simulation with high-performance GPU/CPU computing in Python with Taichi Lang.
Týr-the-Pruner: Search-based Global Structural Pruning for LLMs
This blog introduces Týr-the-Pruner, a search-based, end-to-end framework for global structural pruning of large language models (LLMs).
HPC Coding Agent - Part 1: Combining GLM-powered Cline and RAG Using MCP
Build an HPC RAG agent on AMD Instinct GPUs using GLM-4.6, Cline and ChromaDB.