AI Blogs - Page 3#
Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows
Ray, combined with ROCm, provides a powerful platform for scaling AI applications, particularly for training and inference workloads.
Technical Dive into AMD's MLPerf Inference v5.1 Submission
In this blog, we share the technical details of how we accomplish the results in our MLPerf Inference v5.1 submission.
Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance
This blog describes the technical details of how we prune and fine tune the Llama 3.1 405B model in our MLPerf Inference v5.1 submission.
Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission
In this blog, we will provide step by step instruction on how to reproduce AMD's MLPerf Inference v5.1 Submission
Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration
performance optimizations for llama.cpp on AMD Instinct GPUs
GEMM Tuning within hipBLASLt - Part 1
We introduce a hipBLASLt tuning tool that lets developers optimize GEMM problem sizes and integrate them into the library.
Step-3 Deployment Simplified: A Day 0 Developer’s Guide on AMD Instinct™ GPUs
Learn how to deploy Step-3, a 321B-parameter VLM with MFA & AFD, on AMD Instinct™ GPUs to cut decoding costs and boost long-context reasoning
Unleashing AMD Instinct™ MI300X GPUs for LLM Serving: Disaggregating Prefill & Decode with SGLang
Learn how prefill–decode disaggregation improves LLM inference by reducing latency, enhancing throughput, and optimizing resource usage.
QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang
Quick Reduce speeds up LLM inference on AMD Instinct™ MI300X GPUs with inline-compressed all-reduce, cutting comms overhead by up to 3×
AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs
AITER boosts DeepSeek-V3’s MLA on AMD MI300X GPUs with low-rank projections, shared KV paths & matrix absorption for 2× faster inference.
Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning
A novel approach that replaces visual tokens with perception-conditioned weights, reducing compute while maintaining strong vision-language performance.
Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs
Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.