AI Blogs#

Reproduce AMD's MLPerf Training v5.0 Submission Result with Instinct™ GPUs
Follow this step-by-step guide to reproduce AMDs MLPerf 5.0 Training Submission with Instinct GPUs using ROCm

AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs
Explore the techniques we used to improve the training performance on MI300X and MI325X in our MLPerf Training 5.0 submission.

High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide
Learn how to optimize BERT-L training with mixed precision and Flash Attention v2 on AMD Instinct GPUs — follow our tested MLPerf-compliant step-by-step guide.

Scale LLM Inference with Multi-Node Infrastructure
Learn how to horizontally scale LLM inference using open-source tools on MI300X, with vLLM, nginx, Prometheus, and Grafana.

A Step-by-Step Guide On How To Deploy Llama Stack on AMD Instinct™ GPU
Learn how to use Meta’s Llama Stack with AMD ROCm and vLLM to scale inference, integrate APIs, and streamline production-ready AI workflows on AMD Instinct™ GPU

ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software
Explore ROCm 6.4's key advancements: AI/HPC performance boosts, enhanced profiling tools, better Kubernetes support and modular drivers, accelerating AI and HPC workloads on AMD GPUs.

AMD Advances Enterprise AI Through OPEA Integration
We announce AMD’s support of Open Platform for Enterprise AI (OPEA), integrating OPEA’s enterprise GenAI framework with AMD’s computing hardware and ROCm software

Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators
This blog shows Zyphra's new training kernels for transformers and hybrid models on AMD Instinct MI300X accelerators, surpassing the H100s performance

AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving
AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving

Step-Video-T2V Inference with xDiT on AMD Instinct MI300X GPUs
Learn how to accelerate text-to-video generation using Step-Video-T2V, a 30B parameter T2V model, on AMD MI300X GPUs with ROCm—enabling scalable, high-fidelity video generation from text

DataFrame Acceleration: hipDF and hipDF.pandas on AMD GPUs
This blog post demonstrates how hipDF significantly enhances and accelerates data manipulation, aggregation, and transformation tasks on AMD hardware using ROCm.

CuPy and hipDF on AMD: The Basics and Beyond
Learn how to deploy CuPy and hipDF on AMD GPUs. See their high-performance computing advantages, and use CuPy and hipDF in a detailed example of an investment portfolio allocation optimization using the Markowitz model.

From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile
Learn how to implement FlashAttention-v2 with CK-Tile: minimize memory overhead, maximize compute efficiency, and scale on AMD GPUs

Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs
Learn how to boost your Llama 4 inference performance on AMD MI300X GPUs using AITER-optimized kernels and advanced vLLM techniques

Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs
This blog shows you how to speedup your multimodal models with AMD’s open-source PyTorch tools for speculative decoding on MI300X GPUs

Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs
Build high-performance GEMM kernels using CK-Tile on AMD Instinct GPUs with vendor-optimized pipelines and policies for AI and HPC workloads
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics