AI Blogs - Page 2#

Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs
Learn how to boost your Llama 4 inference performance on AMD MI300X GPUs using AITER-optimized kernels and advanced vLLM techniques

Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs
This blog shows you how to speedup your multimodal models with AMD’s open-source PyTorch tools for speculative decoding on MI300X GPUs

Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration
Deploy verl on AMD GPUs for fast, scalable RLHF training with ROCm optimization, Docker scripts, and impressive throughput-convergence results

A Step-by-Step Guide On How To Deploy Llama Stack on AMD Instinct™ GPU
Learn how to use Meta’s Llama Stack with AMD ROCm and vLLM to scale inference, integrate APIs, and streamline production-ready AI workflows on AMD Instinct™ GPU

Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs
Build high-performance GEMM kernels using CK-Tile on AMD Instinct GPUs with vendor-optimized pipelines and policies for AI and HPC workloads

ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software
Explore ROCm 6.4's key advancements: AI/HPC performance boosts, enhanced profiling tools, better Kubernetes support and modular drivers, accelerating AI and HPC workloads on AMD GPUs.

Unlock Peak Performance on AMD GPUs with Triton Kernel Optimizations
Learn how Triton compiles and optimizes AI kernels on AMD GPUs, with deep dives into IR flows, hardware-specific passes, and performance tuning tips

Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel
Learn how to compress LLMs with GPTQModel and run them efficiently on AMD GPUs using INT4 quantization, reducing memory use, shrinking model size, and enabling fast inference

Power Up Llama 4 with AMD Instinct: A Developer’s Day 0 Quickstart
Explore the power of Meta’s Llama 4 multimodal models on AMD Instinct™ MI300X and MI325X GPUs - available from Day 0 with seamless vLLM integration

AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0
We showcase MI325X GPU optimizations that power our MLPerf v5.0 results on Llama 2 70B, highlighting performance tuning, quantization, and vLLM advancements.

Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission
A step-by-step guide to reproducing AMD’s MLPerf v5.0 results for Llama 2 70B & SDXL using ROCm on MI325X

Bring FLUX to Life on MI300X: Run and Optimize with Hugging Face Diffusers
The blog will walk you through the FLUX text-to-image diffusion model architecture and show you how to run and optimize it on MI300x.