Applications & Models#
Explore the latest blogs about applications and models in the ROCm ecosystem, including machine learning frameworks, AI models, and application case studies.

Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel
Learn how to compress LLMs with GPTQModel and run them efficiently on AMD GPUs using INT4 quantization, reducing memory use, shrinking model size, and enabling fast inference

Power Up Llama 4 with AMD Instinct: A Developer’s Day 0 Quickstart
Explore the power of Meta’s Llama 4 multimodal models on AMD Instinct™ MI300X and MI325X GPUs - available from Day 0 with seamless vLLM integration

AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0
We showcase MI325X GPU optimizations that power our MLPerf v5.0 results on Llama 2 70B, highlighting performance tuning, quantization, and vLLM advancements.

Reproducing the AMD InstinctTM GPUs MLPerf Inference v5.0 Submission
A step-by-step guide to reproducing AMD’s MLPerf v5.0 results for Llama 2 70B & SDXL using ROCm on MI325X

Bring FLUX to Life on MI300X: Run and Optimize with Hugging Face Diffusers
The blog will walk you through the FLUX text-to-image diffusion model architecture and show you how to run and optimize it on MI300x.

Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding
This blog demonstrates out-of-the-box performance improvement in LLM inference using speculative decoding on MI300X.

Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs
Learn how to use Megablocks to pre-train GPT2 Mixture of Experts (MoE) model, helping you scale your deep learning models effectiveness on AMD GPUs using ROCm

Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide
AMD is excited to announce the integration of Google’s Gemma 3 models with AMD Instinct™ MI300X GPUs

Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance
This blog analyzes how tensor parallelism impacts TCO and Scale for LLM deployments in production.

Instella-VL-1B: First AMD Vision Language Model
We introduce Instella-VL-1B, the first AMD vision language model for image understanding trained on MI300X GPUs, outperforming fully open-source models and matching or exceeding many open-weight counterparts in general multimodal benchmarks and OCR-related tasks.

Introducing Instella: New State-of-the-art Fully Open 3B Language Models
AMD is excited to announce Instella, a family of fully open state-of-the-art 3-billion-parameter language models (LMs). , In this blog we explain how the Instella models were trained, and how to access them.

Deploying Serverless AI Inference on AMD GPU Clusters
This blog helps targeted audience in setting up AI inference serverless deployment in a kubernetes cluster with AMD accelerators. Blog aims to provide a comprehensive guide for deploying and scaling AI inference workloads on serverless infrastructre.