Posted in 2025

01 May 2025 - Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools

28 April 2025 - Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart

28 April 2025 - Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs

28 April 2025 - Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

22 April 2025 - A Step-by-Step Guide On How To Deploy Llama Stack on AMD Instinct™ GPU

15 April 2025 - Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs

14 April 2025 - Installing ROCm from source with Spack

11 April 2025 - ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

10 April 2025 - Unlock Peak Performance on AMD GPUs with Triton Kernel Optimizations

09 April 2025 - Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel

06 April 2025 - Power Up Llama 4 with AMD Instinct: A Developer’s Day 0 Quickstart

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

28 March 2025 - What’s New in the AMD GPU Operator v1.2.0 Release

28 March 2025 - Bring FLUX to Life on MI300X: Run and Optimize with Hugging Face Diffusers

27 March 2025 - Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

24 March 2025 - Speculative Decoding - Deep Dive

23 March 2025 - Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs

21 March 2025 - Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

21 March 2025 - AITER: AI Tensor Engine For ROCm

21 March 2025 - AITER: AI Tensor Engine For ROCm

14 March 2025 - Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide

14 March 2025 - Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance

13 March 2025 - Optimized ROCm Docker for Distributed AI Training

13 March 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3

12 March 2025 - AMD Advances Enterprise AI Through OPEA Integration

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

02 March 2025 - Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X

28 February 2025 - Measuring Max-Achievable FLOPs – Part 2

25 February 2025 - Deploying Serverless AI Inference on AMD GPU Clusters

21 February 2025 - Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU

21 February 2025 - How to Build a vLLM Container for Inference and Benchmarking

19 February 2025 - Fine-tuning Phi-3.5-mini LLM at scale: Harnessing Accelerate and Slurm for multinode training

14 February 2025 - Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1

14 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2

13 February 2025 - Navigating vLLM Inference with ROCm and Kubernetes

09 February 2025 - PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm

09 February 2025 - MI300A - Exploring the APU advantage

09 February 2025 - Deep dive into the MI300 compute and memory partition modes

07 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1

06 February 2025 - GEMM Kernel Optimization For AMD GPUs

31 January 2025 - Enhancing AI Training with AMD ROCm Software

29 January 2025 - Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs

29 January 2025 - Announcing the AMD GPU Operator and Metrics Exporter

28 January 2025 - Distributed fine-tuning of MPT-30B using Composer on AMD GPUs

24 January 2025 - Vision Mamba on AMD GPU with ROCm

16 January 2025 - Getting started with AMD ROCm containers: from base images to custom solutions

14 January 2025 - Boosting Computational Fluid Dynamics Performance with AMD Instinct™ MI300X

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

Posted in 2024

10 December 2024 - Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators

03 December 2024 - Transformer based Encoder-Decoder models for image-captioning on AMD GPUs

13 November 2024 - SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs

13 November 2024 - Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs

13 November 2024 - Introducing AMD’s Next-Gen Fortran Compiler

01 November 2024 - Distributed Data Parallel Training on AMD GPU with ROCm

24 October 2024 - Torchtune on AMD GPUs How-To Guide: Fine-tuning and Scaling LLMs with Multi-GPU Power

24 October 2024 - CTranslate2: Efficient Inference with Transformer Models on AMD GPUs

23 October 2024 - Inference with Llama 3.2 Vision LLMs on AMD GPUs Using ROCm

15 October 2024 - Speed Up Text Generation with Speculative Sampling on AMD GPUs

15 October 2024 - Multinode Fine-Tuning of Stable Diffusion XL on AMD GPUs with Hugging Face Accelerate and OCI’s Kubernetes Engine (OKE)

11 October 2024 - Enhancing vLLM Inference on AMD GPUs

09 October 2024 - Supercharging JAX with Triton Kernels on AMD GPUs

03 October 2024 - Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch

23 September 2024 - Fine-tuning Llama 3 with Axolotl using ROCm on AMD GPUs

19 September 2024 - Inferencing and serving with vLLM on AMD GPUs

17 September 2024 - Getting to Know Your GPU: A Deep Dive into AMD SMI

10 September 2024 - Introducing the AMD ROCm™ Offline Installer Creator: Simplifying Deployment for AI and HPC

06 September 2024 - Optimize GPT Training: Enabling Mixed Precision Training in JAX using ROCm on AMD GPUs

03 September 2024 - Image Classification with BEiT, MobileNet, and EfficientNet using ROCm on AMD GPUs

29 August 2024 - Seismic stencil codes - part 3

29 August 2024 - Seismic stencil codes - part 2

29 August 2024 - Seismic stencil codes - part 1

28 August 2024 - Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission

21 August 2024 - Performing natural language processing tasks with LLMs on ROCm running on AMD GPUs

19 August 2024 - Using AMD GPUs for Enhanced Time Series Forecasting with Transformers

09 August 2024 - Inferencing with Grok-1 on AMD GPUs

29 July 2024 - Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD

29 July 2024 - Graph analytics on AMD GPUs using Gunrock

22 July 2024 - Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs

11 July 2024 - DBRX Instruct on AMD GPUs

11 July 2024 - Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

03 July 2024 - Accelerating models on ROCm using PyTorch TunableOp

02 July 2024 - A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs

28 June 2024 - Mamba on AMD GPUs with ROCm

28 June 2024 - Deep Learning Recommendation Models on AMD GPUs

27 June 2024 - Fine-tuning and Testing Cutting-Edge Speech Models using ROCm on AMD GPUs

18 June 2024 - TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs

10 June 2024 - Stone Ridge Expands Reservoir Simulation Options with AMD Instinct™ Accelerators

04 June 2024 - Segment Anything with AMD GPUs

31 May 2024 - SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel

29 May 2024 - Unveiling performance insights with PyTorch Profiler on an AMD GPU

23 May 2024 - Panoptic segmentation and instance segmentation with Detectron2 on AMD GPUs

16 May 2024 - Siemens taps AMD Instinct™ GPUs to expand high-performance hardware options for Simcenter STAR-CCM+

16 May 2024 - AMD Collaboration with the University of Michigan offers High Performance Open-Source Solutions to the Bioinformatics Community

15 May 2024 - Accelerating Large Language Models with Flash Attention on AMD GPUs

13 May 2024 - Reading AMD GPU ISA

07 May 2024 - AMD in Action: Unveiling the Power of Application Tracing and Profiling

01 May 2024 - Step-by-Step Guide to Use OpenLLM on AMD GPUs

01 May 2024 - Inferencing with Mixtral 8x22B on AMD GPUs

30 April 2024 - Training a Neural Collaborative Filtering (NCF) Recommender on an AMD GPU

26 April 2024 - Table Question-Answering with TaPas

26 April 2024 - Multimodal (Visual and Language) understanding with LLaVA-NeXT

26 April 2024 - Application portability with HIP

24 April 2024 - Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

24 April 2024 - Transforming Words into Motion: A Guide to Video Generation with AMD GPU

18 April 2024 - C++17 parallel algorithms and HIPSTDPAR

17 April 2024 - Inferencing with AI2’s OLMo model on AMD GPU

16 April 2024 - Text Summarization with FLAN-T5

16 April 2024 - Speech-to-Text on an AMD GPU with Whisper

16 April 2024 - PyTorch C++ Extension on AMD GPU

16 April 2024 - Programming AMD GPUs with Julia

16 April 2024 - Program Synthesis with CodeGen

16 April 2024 - Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU

16 April 2024 - Instruction fine-tuning of StarCoder with PEFT on multiple AMD GPUs

16 April 2024 - Affinity part 2 - System topology and controlling affinity

16 April 2024 - Affinity part 1 - Affinity, placement, and order

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama Model on a single AMD GPU

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU

15 April 2024 - Developing Triton Kernels on AMD GPUs

11 April 2024 - GPU Unleashed: Training Reinforcement Learning Agents with Stable Baselines3 on an AMD GPU in Gymnasium Environment

09 April 2024 - ResNet for image classification using AMD GPUs

08 April 2024 - Small language models with Phi-2

04 April 2024 - Using the ChatGLM-6B bilingual language model with AMD GPUs

04 April 2024 - Total body segmentation using MONAI Deploy on an AMD GPU

04 April 2024 - Retrieval Augmented Generation (RAG) using LlamaIndex

04 April 2024 - Image classification using Vision Transformer with AMD GPUs

04 April 2024 - Building semantic search with SentenceTransformers on AMD

01 April 2024 - Scale AI applications with Ray

29 March 2024 - Automatic mixed precision in PyTorch using AMD GPUs

15 March 2024 - Large language model inference optimizations on AMD GPUs

12 March 2024 - Building a decoder transformer model on AMD GPU(s)

11 March 2024 - Question-answering Chatbot with LangChain on an AMD GPU

08 March 2024 - Music Generation With MusicGen on an AMD GPU

23 February 2024 - Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs

08 February 2024 - Simplifying deep learning: A guide to PyTorch Lightning

07 February 2024 - Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU

05 February 2024 - Using LoRA for efficient fine-tuning: Fundamental principles

01 February 2024 - Fine-tune Llama model with LoRA: Customizing a large language model for question-answering

01 February 2024 - Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

29 January 2024 - Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU

26 January 2024 - Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

26 January 2024 - Accelerating XGBoost with Dask using multiple AMD GPUs

25 January 2024 - LLM distributed supervised fine-tuning with JAX

24 January 2024 - Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs

24 January 2024 - Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs

24 January 2024 - Efficient deployment of large language models with Text Generation Inference on AMD GPUs

Posted in 2022

14 November 2022 - Finite difference method - Laplacian part 1

14 November 2022 - AMD matrix cores