AI Blogs - Page 3

AI Blogs - Page 3#

Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows

Ray, combined with ROCm, provides a powerful platform for scaling AI applications, particularly for training and inference workloads.

September 10, 2025 by Vicky Tsang, Yao Liu, Phani Vaddadi, Vish Vadlamani

Technical Dive into AMD's MLPerf Inference v5.1 Submission

In this blog, we share the technical details of how we accomplish the results in our MLPerf Inference v5.1 submission.

September 09, 2025 by Meena Arunachalam, Miro Hodak, Poovaiah Palangappa, Wei-Ting Liao, Uma Kannikanti, Fulu Li, Neha Mathews, Rajesh Poornachandran, Ean Garvey, Kumar Deepak, Yixing Xu, Zhe Li, Guanchen Li, Xuanwu Yin, Dong Li, Zhao Lin, Wei Luo, Bowen Bao, Spandan Tiwari, Niels Zhang, Vinayak Gokhale, Clint Greene, Eliot Li

Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

This blog describes the technical details of how we prune and fine tune the Llama 3.1 405B model in our MLPerf Inference v5.1 submission.

September 09, 2025 by Meena Arunachalam, Miro Hodak, Poovaiah Palangappa, Fulu Li, Yixing Xu, Zhe Li, Guanchen Li, Xuanwu Yin, Dong Li, Karan Verma, Clint Greene, Eliot Li

Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

In this blog, we will provide step by step instruction on how to reproduce AMD's MLPerf Inference v5.1 Submission

September 09, 2025 by Meena Arunachalam, Miro Hodak, Poovaiah Palangappa, Wei-Ting Liao, Uma Kannikanti, Fulu Li, Karan Verma, Neha Mathews, Yamini Kamisetty, Chelsea Iluno, Ean Garvey, Kumar Deepak, Yixing Xu, Zhe Li, Guanchen Li, Xuanwu Yin, Dong Li, Clint Greene, Eliot Li

Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

performance optimizations for llama.cpp on AMD Instinct GPUs

September 09, 2025 by Deepan Sekar, Pei Zhang, Eliot Li, Yao Liu, Phani Vaddadi, Vish Vadlamani

GEMM Tuning within hipBLASLt - Part 1

We introduce a hipBLASLt tuning tool that lets developers optimize GEMM problem sizes and integrate them into the library.

September 05, 2025 by YangWen Huang, Carson Liao

Step-3 Deployment Simplified: A Day 0 Developer’s Guide on AMD Instinct™ GPUs

Learn how to deploy Step-3, a 321B-parameter VLM with MFA & AFD, on AMD Instinct™ GPUs to cut decoding costs and boost long-context reasoning

September 04, 2025 by George Wang, Ning Zhang

Unleashing AMD Instinct™ MI300X GPUs for LLM Serving: Disaggregating Prefill & Decode with SGLang

Learn how prefill–decode disaggregation improves LLM inference by reducing latency, enhancing throughput, and optimizing resource usage.

August 28, 2025 by Bill He, Andy Luo

QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

Quick Reduce speeds up LLM inference on AMD Instinct™ MI300X GPUs with inline-compressed all-reduce, cutting comms overhead by up to 3×

August 26, 2025 by Haoyang Li, Wei Luo, Xinjun Niu, Spandan Tiwari, Ke Wang, Jiangyong Ren, Ashish Sirasao, Doug Lehr

AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs

AITER boosts DeepSeek-V3’s MLA on AMD MI300X GPUs with low-rank projections, shared KV paths & matrix absorption for 2× faster inference.

August 25, 2025 by Daniel Huang, George Wang

Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning

A novel approach that replaces visual tokens with perception-conditioned weights, reducing compute while maintaining strong vision-language performance.

August 22, 2025 by Zhenhua Liu, Xuanwu Yin, Dong Li, Emad Barsoum

Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

August 22, 2025 by Wen Xie, Yao Fu, Xiaoming Peng, Xiaobo Chen, Liz Li, Vidushi Goyal, Anshul Gupta

Prev Page 3 of 17 Next