AI Blogs - Page 5

AI Blogs - Page 5#

Speculative Decoding - Deep Dive

This blog shows the performance improvement achieved by applying speculative decoding with Llama models on AMD MI300X GPUs, tested across models, input sizes, and datasets.

March 24, 2025 by Chang Liu

Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs

Learn how to use Megablocks to pre-train GPT2 Mixture of Experts (MoE) model, helping you scale your deep learning models effectiveness on AMD GPUs using ROCm

March 23, 2025 by Fabricio Flores, Rishi Madduri, Yao Liu, Phani Vaddadi, Vish Vadlamani

Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

Learn how to optimize DeepSeek-R1 on AMD MI300X with SGLang, AITER kernels and hyperparameter tuning for up to 5× throughput and 60% lower latency over Nvidia H200

March 21, 2025 by Peng Sun, Andy Luo, Seungrok Jung, Liz Li, Hai Xiao

AITER: AI Tensor Engine For ROCm

We introduce AMD's AI Tensor Engine for ROCm (AITER), our centralized high performance AI operators repository, designed to significantly accelerate AI workloads on AMD GPUs

March 21, 2025 by Shekhar Pandey, Liz Li, Carlus Huang, Lingpeng Jin, Anshul Gupta

Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide

AMD is excited to announce the integration of Google’s Gemma 3 models with AMD Instinct™ MI300X GPUs

March 14, 2025 by Shekhar Pandey, Anshul Gupta

Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance

This blog analyzes how tensor parallelism impacts TCO and Scale for LLM deployments in production.

March 14, 2025 by Eduardo Alvarez

AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3

This blog is part 3 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernetes and the AMD GPU Operator on the AMD Instinct platform

March 13, 2025 by Victor Robles

Optimized ROCm Docker for Distributed AI Training

AMD updated Docker images incorporate torchtune finetuning, FP8 support, single node performance boost, bug fixes & updated benchmarking for stable, efficient distributed training

March 13, 2025 by Yao Fu, Anshul Gupta

AMD Advances Enterprise AI Through OPEA Integration

We announce AMD’s support of Open Platform for Enterprise AI (OPEA), integrating OPEA’s enterprise GenAI framework with AMD’s computing hardware and ROCm software

March 12, 2025 by Yu Wang, Alex He

Instella-VL-1B: First AMD Vision Language Model

We introduce Instella-VL-1B, the first AMD vision language model for image understanding trained on MI300X GPUs, outperforming fully open-source models and matching or exceeding many open-weight counterparts in general multimodal benchmarks and OCR-related tasks.

March 07, 2025 by Ximeng Sun, Aditya Kumar Singh, Gowtham Ramesh, Zicheng Liu, Pratik Prabhanjan Brahma, Ze Wang, Jiang Liu, Jialian Wu, Prakamya Mishra, Xiaodong Yu, Yusheng Su, Sudhanshu Ranjan, Emad Barsoum

Introducing Instella: New State-of-the-art Fully Open 3B Language Models

AMD is excited to announce Instella, a family of fully open state-of-the-art 3-billion-parameter language models (LMs). , In this blog we explain how the Instella models were trained, and how to access them.

March 05, 2025 by Jiang Liu, Jialian Wu, Xiaodong Yu, Prakamya Mishra, Sudhanshu Ranjan, Zicheng Liu, Chaitanya Manem, Yusheng Su, Pratik Prabhanjan Brahma, Gowtham Ramesh, Ximeng Sun, Ze Wang, Emad Barsoum

Measuring Max-Achievable FLOPs – Part 2

AMD measures Max-Achievable FLOPS through controlled benchmarking: real-world data patterns, thermally stable devices, and cold cache testing—revealing how actual performance differs from theoretical peaks.

February 28, 2025 by Ben Sander, Evan Masters, Babak Poursartip, Henry Ho

Prev Page 5 of 14 Next