AI Blogs - Page 11#
Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission
A step-by-step guide to reproducing AMD’s MLPerf v5.0 results for Llama 2 70B & SDXL using ROCm on MI325X
Bring FLUX to Life on MI300X: Run and Optimize with Hugging Face Diffusers
The blog will walk you through the FLUX text-to-image diffusion model architecture and show you how to run and optimize it on MI300x.
Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding
This blog demonstrates out-of-the-box performance improvement in LLM inference using speculative decoding on MI300X.
Speculative Decoding - Deep Dive
This blog shows the performance improvement achieved by applying speculative decoding with Llama models on AMD MI300X GPUs, tested across models, input sizes, and datasets.
Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs
Learn how to use MegaBlocks to pre-train GPT2 Mixture of Experts (MoE) model, helping you scale your deep learning models effectiveness on AMD GPUs using ROCm
Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X
Learn how to optimize DeepSeek-R1 on AMD MI300X with SGLang, AITER kernels and hyperparameter tuning for up to 5× throughput and 60% lower latency over Nvidia H200
AITER: AI Tensor Engine For ROCm
We introduce AMD's AI Tensor Engine for ROCm (AITER), our centralized high performance AI operators repository, designed to significantly accelerate AI workloads on AMD GPUs
Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide
AMD is excited to announce the integration of Google’s Gemma 3 models with AMD Instinct™ MI300X GPUs
Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance
This blog analyzes how tensor parallelism impacts TCO and Scale for LLM deployments in production.
AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3
This blog is part 3 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernetes and the AMD GPU Operator on the AMD Instinct platform
Optimized ROCm Docker for Distributed AI Training
AMD updated Docker images incorporate torchtune finetuning, FP8 support, single node performance boost, bug fixes & updated benchmarking for stable, efficient distributed training
AMD Advances Enterprise AI Through OPEA Integration
We announce AMD’s support of Open Platform for Enterprise AI (OPEA), integrating OPEA’s enterprise GenAI framework with AMD’s computing hardware and ROCm software