Recent Posts#

vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance
vLLM v1 on AMD ROCm boosts LLM serving with faster TTFT, higher throughput, and optimized multimodal support—ready out of the box.

Unlocking GPU-Accelerated Containers with the AMD Container Toolkit
Simplify GPU acceleration in containers with the AMD Container Toolkit—streamlined setup, runtime hooks, and full ROCm integration.

Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm
vLLM v0.9.x is here with major ROCm™ optimizations—boosting LLM performance, reducing latency, and expanding model support on AMD Instinct™ GPUs.

Performance Profiling on AMD GPUs – Part 1: Foundations
Part 1 of our GPU profiling series introduces ROCm tools, setup steps, and key concepts to prepare you for deeper dives in the posts to follow.

Enabling Real-Time Context for LLMs: Model Context Protocol (MCP) on AMD GPUs
Learn how to leverage Model Context Protocol (MCP) servers to provide real time context information to LLMs through a chatbot example on AMD GPUs

Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation
A step by step guide to adapting LLMs to new languages via continued pretraining, with Poro 2 boosting Finnish performance using Llama 3.1 and AMD GPUs

Fine-Tuning LLMs with GRPO on AMD MI300X: Scalable RLHF with Hugging Face TRL and ROCm
Fine-tune LLMs with GRPO on AMD MI300X—leverage ROCm, Hugging Face TRL, and vLLM for efficient reasoning and scalable RLHF

Aligning Mixtral 8x7B with TRL on AMD GPUs
This blog demonstrates how to fine-tune and align Mixtral 8x7B with TRL using DPO and evaluate it on AMD GPUs.

Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability
Learn about Instella-Long: AMD’s open 3B language model supporting 128K context, trained on MI300X GPUs, outperforming peers on long-context benchmarks.

AMD ROCm: Powering the World's Fastest Supercomputers
Discover how ROCm drives the world’s top supercomputers, from El Capitan to Frontier, and why its shaping the future of scalable, open and sustainable HPC

LLM Quantization with Quark on AMD GPUs: Accuracy and Performance Evaluation
Learn how to use Quark to apply FP8 quantization to LLMs on AMD GPUs, and evaluate accuracy and performance using vLLM and SGLang on AMD MI300X GPUs.

The ROCm Revisited Series
We present our ROCm Revisited Series. Discover ROCm's role in leading edge supercomputing, its growing ecosystem-from HIP, to developer tools-powering AI, HPC, and data science across multi-GPU and cluster systems