Tags — ROCm Blogs

Posts tagged AI/ML

18 July 2025 - Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs

18 July 2025 - Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

17 July 2025 - Vibe Coding Pac-Man Inspired Game with DeepSeek-R1 and AMD Instinct MI300X

15 July 2025 - Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs

14 July 2025 - Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot

09 July 2025 - Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day

07 July 2025 - vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance

03 July 2025 - Unlocking GPU-Accelerated Containers with the AMD Container Toolkit

20 June 2025 - Enabling Real-Time Context for LLMs: Model Context Protocol (MCP) on AMD GPUs

18 June 2025 - Fine-Tuning LLMs with GRPO on AMD MI300X: Scalable RLHF with Hugging Face TRL and ROCm

18 June 2025 - Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation

12 June 2025 - Aligning Mixtral 8x7B with TRL on AMD GPUs

11 June 2025 - Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

10 June 2025 - AMD ROCm: Powering the World’s Fastest Supercomputers

06 June 2025 - The ROCm Revisited Series

06 June 2025 - ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

30 May 2025 - Scale LLM Inference with Multi-Node Infrastructure

21 May 2025 - From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile

20 May 2025 - AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving

15 May 2025 - Step-Video-T2V Inference with xDiT on AMD Instinct MI300X GPUs

07 May 2025 - DataFrame Acceleration: hipDF and hipDF.pandas on AMD GPUs

06 May 2025 - CuPy and hipDF on AMD: The Basics and Beyond

28 April 2025 - Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart

28 April 2025 - Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

22 April 2025 - A Step-by-Step Guide On How To Deploy Llama Stack on AMD Instinct™ GPU

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

10 April 2025 - Unlock Peak Performance on AMD GPUs with Triton Kernel Optimizations

09 April 2025 - Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel

06 April 2025 - Power Up Llama 4 with AMD Instinct: A Developer’s Day 0 Quickstart

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

28 March 2025 - Bring FLUX to Life on MI300X: Run and Optimize with Hugging Face Diffusers

27 March 2025 - Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

24 March 2025 - Speculative Decoding - Deep Dive

23 March 2025 - Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs

21 March 2025 - Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

21 March 2025 - AITER: AI Tensor Engine For ROCm

14 March 2025 - Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide

14 March 2025 - Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance

13 March 2025 - Optimized ROCm Docker for Distributed AI Training

13 March 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3

12 March 2025 - AMD Advances Enterprise AI Through OPEA Integration

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

28 February 2025 - Measuring Max-Achievable FLOPs – Part 2

25 February 2025 - Deploying Serverless AI Inference on AMD GPU Clusters

21 February 2025 - Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU

21 February 2025 - How to Build a vLLM Container for Inference and Benchmarking

19 February 2025 - Fine-tuning Phi-3.5-mini LLM at scale: Harnessing Accelerate and Slurm for multinode training

14 February 2025 - Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1

14 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2

13 February 2025 - Navigating vLLM Inference with ROCm and Kubernetes

09 February 2025 - PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm

07 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1

06 February 2025 - GEMM Kernel Optimization For AMD GPUs

31 January 2025 - Enhancing AI Training with AMD ROCm Software

29 January 2025 - Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs

28 January 2025 - Distributed fine-tuning of MPT-30B using Composer on AMD GPUs

24 January 2025 - Vision Mamba on AMD GPU with ROCm

16 January 2025 - Getting started with AMD ROCm containers: from base images to custom solutions

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

10 December 2024 - Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators

03 December 2024 - Transformer based Encoder-Decoder models for image-captioning on AMD GPUs

13 November 2024 - SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs

13 November 2024 - Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs

01 November 2024 - Distributed Data Parallel Training on AMD GPU with ROCm

24 October 2024 - Torchtune on AMD GPUs How-To Guide: Fine-tuning and Scaling LLMs with Multi-GPU Power

24 October 2024 - CTranslate2: Efficient Inference with Transformer Models on AMD GPUs

23 October 2024 - Inference with Llama 3.2 Vision LLMs on AMD GPUs Using ROCm

15 October 2024 - Speed Up Text Generation with Speculative Sampling on AMD GPUs

15 October 2024 - Multinode Fine-Tuning of Stable Diffusion XL on AMD GPUs with Hugging Face Accelerate and OCI’s Kubernetes Engine (OKE)

09 October 2024 - Supercharging JAX with Triton Kernels on AMD GPUs

03 October 2024 - Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch

23 September 2024 - Fine-tuning Llama 3 with Axolotl using ROCm on AMD GPUs

19 September 2024 - Inferencing and serving with vLLM on AMD GPUs

19 September 2024 - Enhancing vLLM Inference on AMD GPUs

06 September 2024 - Optimize GPT Training: Enabling Mixed Precision Training in JAX using ROCm on AMD GPUs

03 September 2024 - Image Classification with BEiT, MobileNet, and EfficientNet using ROCm on AMD GPUs

28 August 2024 - Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission

21 August 2024 - Performing natural language processing tasks with LLMs on ROCm running on AMD GPUs

19 August 2024 - Using AMD GPUs for Enhanced Time Series Forecasting with Transformers

09 August 2024 - Inferencing with Grok-1 on AMD GPUs

29 July 2024 - Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD

22 July 2024 - Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs

11 July 2024 - DBRX Instruct on AMD GPUs

11 July 2024 - Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

02 July 2024 - A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs

27 June 2024 - Fine-tuning and Testing Cutting-Edge Speech Models using ROCm on AMD GPUs

18 June 2024 - TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs

04 June 2024 - Segment Anything with AMD GPUs

29 May 2024 - Unveiling performance insights with PyTorch Profiler on an AMD GPU

23 May 2024 - Panoptic segmentation and instance segmentation with Detectron2 on AMD GPUs

15 May 2024 - Accelerating Large Language Models with Flash Attention on AMD GPUs

01 May 2024 - Step-by-Step Guide to Use OpenLLM on AMD GPUs

01 May 2024 - Inferencing with Mixtral 8x22B on AMD GPUs

30 April 2024 - Training a Neural Collaborative Filtering (NCF) Recommender on an AMD GPU

26 April 2024 - Table Question-Answering with TaPas

26 April 2024 - Multimodal (Visual and Language) understanding with LLaVA-NeXT

24 April 2024 - Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

24 April 2024 - Transforming Words into Motion: A Guide to Video Generation with AMD GPU

17 April 2024 - Inferencing with AI2’s OLMo model on AMD GPU

16 April 2024 - Text Summarization with FLAN-T5

16 April 2024 - Speech-to-Text on an AMD GPU with Whisper

16 April 2024 - PyTorch C++ Extension on AMD GPU

16 April 2024 - Programming AMD GPUs with Julia

16 April 2024 - Program Synthesis with CodeGen

16 April 2024 - Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU

16 April 2024 - Instruction fine-tuning of StarCoder with PEFT on multiple AMD GPUs

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama Model on a single AMD GPU

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU

15 April 2024 - Developing Triton Kernels on AMD GPUs

11 April 2024 - GPU Unleashed: Training Reinforcement Learning Agents with Stable Baselines3 on an AMD GPU in Gymnasium Environment

09 April 2024 - ResNet for image classification using AMD GPUs

08 April 2024 - Small language models with Phi-2

04 April 2024 - Using the ChatGLM-6B bilingual language model with AMD GPUs

04 April 2024 - Total body segmentation using MONAI Deploy on an AMD GPU

04 April 2024 - Retrieval Augmented Generation (RAG) using LlamaIndex

04 April 2024 - Image classification using Vision Transformer with AMD GPUs

04 April 2024 - Building semantic search with SentenceTransformers on AMD

01 April 2024 - Scale AI applications with Ray

29 March 2024 - Automatic mixed precision in PyTorch using AMD GPUs

15 March 2024 - Large language model inference optimizations on AMD GPUs

12 March 2024 - Building a decoder transformer model on AMD GPU(s)

11 March 2024 - Question-answering Chatbot with LangChain on an AMD GPU

08 March 2024 - Music Generation With MusicGen on an AMD GPU

23 February 2024 - Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs

08 February 2024 - Simplifying deep learning: A guide to PyTorch Lightning

07 February 2024 - Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU

05 February 2024 - Using LoRA for efficient fine-tuning: Fundamental principles

01 February 2024 - Fine-tune Llama model with LoRA: Customizing a large language model for question-answering

01 February 2024 - Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

29 January 2024 - Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU

26 January 2024 - Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

26 January 2024 - Accelerating XGBoost with Dask using multiple AMD GPUs

25 January 2024 - LLM distributed supervised fine-tuning with JAX

24 January 2024 - Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs

24 January 2024 - Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs

24 January 2024 - Efficient deployment of large language models with Text Generation Inference on AMD GPUs

11 September 2023 - Creating a PyTorch/TensorFlow code environment on AMD GPUs

Posts tagged C++

06 June 2025 - ROCm Revisited: Getting Started with HIP

18 April 2024 - C++17 parallel algorithms and HIPSTDPAR

16 April 2024 - PyTorch C++ Extension on AMD GPU

Posts tagged Compiler

06 June 2025 - ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem

28 May 2025 - HIP 7.0 Is Coming: What You Need to Know to Stay Ahead

06 May 2025 - Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

09 February 2025 - MI300A - Exploring the APU advantage

13 November 2024 - Introducing AMD’s Next-Gen Fortran Compiler

11 July 2024 - Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

13 May 2024 - Reading AMD GPU ISA

26 April 2024 - Application portability with HIP

18 April 2024 - C++17 parallel algorithms and HIPSTDPAR

08 June 2023 - GPU-aware MPI with ROCm

17 May 2023 - Register pressure in AMD CDNA™2 GPUs

11 May 2023 - Finite difference method - Laplacian part 3

14 November 2022 - AMD matrix cores

Posts tagged Computer Vision

18 July 2025 - Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

12 May 2025 - Accelerated JPEG decoding on AMD Instinct™ GPUs with rocJPEG

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

24 January 2025 - Vision Mamba on AMD GPU with ROCm

03 September 2024 - Image Classification with BEiT, MobileNet, and EfficientNet using ROCm on AMD GPUs

11 July 2024 - Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

04 June 2024 - Segment Anything with AMD GPUs

23 May 2024 - Panoptic segmentation and instance segmentation with Detectron2 on AMD GPUs

24 April 2024 - Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

16 April 2024 - Speech-to-Text on an AMD GPU with Whisper

16 April 2024 - Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU

09 April 2024 - ResNet for image classification using AMD GPUs

04 April 2024 - Total body segmentation using MONAI Deploy on an AMD GPU

04 April 2024 - Image classification using Vision Transformer with AMD GPUs

04 April 2024 - Building semantic search with SentenceTransformers on AMD

Posts tagged Data Science

16 May 2024 - Siemens taps AMD Instinct™ GPUs to expand high-performance hardware options for Simcenter STAR-CCM+

16 April 2024 - Programming AMD GPUs with Julia

Posts tagged Developers

06 June 2025 - ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem

28 May 2025 - HIP 7.0 Is Coming: What You Need to Know to Stay Ahead

22 May 2025 - ROCm Runfile Installer Is Here!

21 May 2025 - From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile

20 May 2025 - Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs

16 April 2024 - Programming AMD GPUs with Julia

Posts tagged Diffusion Model

15 October 2024 - Multinode Fine-Tuning of Stable Diffusion XL on AMD GPUs with Hugging Face Accelerate and OCI’s Kubernetes Engine (OKE)

17 April 2024 - Inferencing with AI2’s OLMo model on AMD GPU

01 April 2024 - Scale AI applications with Ray

23 February 2024 - Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs

24 January 2024 - Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs

Posts tagged Fine-Tuning

21 July 2025 - Chain-of-Thought Guided Visual Reasoning Using Llama 3.2 on a Single AMD Instinct MI300X GPU

18 June 2025 - Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation

12 June 2025 - Aligning Mixtral 8x7B with TRL on AMD GPUs

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

15 April 2025 - Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs

14 March 2025 - Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance

13 March 2025 - Optimized ROCm Docker for Distributed AI Training

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

21 February 2025 - Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU

09 February 2025 - PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm

29 January 2025 - Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs

28 January 2025 - Distributed fine-tuning of MPT-30B using Composer on AMD GPUs

13 November 2024 - Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs

23 October 2024 - Inference with Llama 3.2 Vision LLMs on AMD GPUs Using ROCm

15 October 2024 - Multinode Fine-Tuning of Stable Diffusion XL on AMD GPUs with Hugging Face Accelerate and OCI’s Kubernetes Engine (OKE)

26 April 2024 - Table Question-Answering with TaPas

26 April 2024 - Multimodal (Visual and Language) understanding with LLaVA-NeXT

24 April 2024 - Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

16 April 2024 - Text Summarization with FLAN-T5

16 April 2024 - Instruction fine-tuning of StarCoder with PEFT on multiple AMD GPUs

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama Model on a single AMD GPU

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU

08 April 2024 - Small language models with Phi-2

01 April 2024 - Scale AI applications with Ray

15 March 2024 - Large language model inference optimizations on AMD GPUs

12 March 2024 - Building a decoder transformer model on AMD GPU(s)

11 March 2024 - Question-answering Chatbot with LangChain on an AMD GPU

08 March 2024 - Music Generation With MusicGen on an AMD GPU

08 February 2024 - Simplifying deep learning: A guide to PyTorch Lightning

05 February 2024 - Using LoRA for efficient fine-tuning: Fundamental principles

01 February 2024 - Fine-tune Llama model with LoRA: Customizing a large language model for question-answering

01 February 2024 - Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

29 January 2024 - Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU

26 January 2024 - Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

25 January 2024 - LLM distributed supervised fine-tuning with JAX

24 January 2024 - Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs

Posts tagged GenAI

17 July 2025 - Vibe Coding Pac-Man Inspired Game with DeepSeek-R1 and AMD Instinct MI300X

11 July 2025 - Accelerating Video Generation on ROCm with Unified Sequence Parallelism: A Practical Guide

28 June 2025 - Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm

09 June 2025 - LLM Quantization with Quark on AMD GPUs: Accuracy and Performance Evaluation

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

09 April 2025 - Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

28 March 2025 - Bring FLUX to Life on MI300X: Run and Optimize with Hugging Face Diffusers

27 March 2025 - Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding

24 March 2025 - Speculative Decoding - Deep Dive

13 March 2025 - Optimized ROCm Docker for Distributed AI Training

13 March 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3

12 March 2025 - AMD Advances Enterprise AI Through OPEA Integration

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

25 February 2025 - Deploying Serverless AI Inference on AMD GPU Clusters

21 February 2025 - How to Build a vLLM Container for Inference and Benchmarking

14 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2

09 February 2025 - PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm

07 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1

31 January 2025 - Enhancing AI Training with AMD ROCm Software

24 January 2025 - Vision Mamba on AMD GPU with ROCm

10 December 2024 - Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators

03 December 2024 - Transformer based Encoder-Decoder models for image-captioning on AMD GPUs

13 November 2024 - SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs

01 November 2024 - Distributed Data Parallel Training on AMD GPU with ROCm

24 October 2024 - CTranslate2: Efficient Inference with Transformer Models on AMD GPUs

23 October 2024 - Inference with Llama 3.2 Vision LLMs on AMD GPUs Using ROCm

15 October 2024 - Speed Up Text Generation with Speculative Sampling on AMD GPUs

15 October 2024 - Multinode Fine-Tuning of Stable Diffusion XL on AMD GPUs with Hugging Face Accelerate and OCI’s Kubernetes Engine (OKE)

03 October 2024 - Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch

06 September 2024 - Optimize GPT Training: Enabling Mixed Precision Training in JAX using ROCm on AMD GPUs

11 July 2024 - Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

03 July 2024 - Accelerating models on ROCm using PyTorch TunableOp

02 July 2024 - A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs

28 June 2024 - Mamba on AMD GPUs with ROCm

24 April 2024 - Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

24 April 2024 - Transforming Words into Motion: A Guide to Video Generation with AMD GPU

17 April 2024 - Inferencing with AI2’s OLMo model on AMD GPU

16 April 2024 - Program Synthesis with CodeGen

16 April 2024 - Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU

16 April 2024 - Instruction fine-tuning of StarCoder with PEFT on multiple AMD GPUs

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama Model on a single AMD GPU

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU

04 April 2024 - Image classification using Vision Transformer with AMD GPUs

04 April 2024 - Building semantic search with SentenceTransformers on AMD

01 April 2024 - Scale AI applications with Ray

15 March 2024 - Large language model inference optimizations on AMD GPUs

08 March 2024 - Music Generation With MusicGen on an AMD GPU

23 February 2024 - Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs

07 February 2024 - Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU

05 February 2024 - Using LoRA for efficient fine-tuning: Fundamental principles

01 February 2024 - Fine-tune Llama model with LoRA: Customizing a large language model for question-answering

01 February 2024 - Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

29 January 2024 - Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU

26 January 2024 - Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

25 January 2024 - LLM distributed supervised fine-tuning with JAX

24 January 2024 - Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs

24 January 2024 - Efficient deployment of large language models with Text Generation Inference on AMD GPUs

Posts tagged HPC

26 June 2025 - Performance Profiling on AMD GPUs – Part 1: Foundations

10 June 2025 - AMD ROCm: Powering the World’s Fastest Supercomputers

06 June 2025 - The ROCm Revisited Series

06 June 2025 - ROCm Revisited: Getting Started with HIP

06 June 2025 - ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem

28 May 2025 - HIP 7.0 Is Coming: What You Need to Know to Stay Ahead

20 May 2025 - Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs

14 April 2025 - Installing ROCm from source with Spack

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

28 February 2025 - Measuring Max-Achievable FLOPs – Part 2

09 February 2025 - MI300A - Exploring the APU advantage

09 February 2025 - Deep dive into the MI300 compute and memory partition modes

14 January 2025 - Boosting Computational Fluid Dynamics Performance with AMD Instinct™ MI300X

13 November 2024 - Introducing AMD’s Next-Gen Fortran Compiler

17 September 2024 - Getting to Know Your GPU: A Deep Dive into AMD SMI

10 September 2024 - Introducing the AMD ROCm™ Offline Installer Creator: Simplifying Deployment for AI and HPC

29 August 2024 - Seismic stencil codes - part 3

29 August 2024 - Seismic stencil codes - part 2

29 August 2024 - Seismic stencil codes - part 1

29 July 2024 - Graph analytics on AMD GPUs using Gunrock

18 June 2024 - TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs

13 May 2024 - Reading AMD GPU ISA

07 May 2024 - AMD in Action: Unveiling the Power of Application Tracing and Profiling

26 April 2024 - Application portability with HIP

18 April 2024 - C++17 parallel algorithms and HIPSTDPAR

16 April 2024 - Programming AMD GPUs with Julia

16 April 2024 - Affinity part 2 - System topology and controlling affinity

16 April 2024 - Affinity part 1 - Affinity, placement, and order

03 November 2023 - Sparse matrix vector multiplication - part 1

15 September 2023 - Jacobi Solver with HIP and OpenMP offloading

18 July 2023 - Finite difference method - Laplacian part 4

08 June 2023 - GPU-aware MPI with ROCm

17 May 2023 - Register pressure in AMD CDNA™2 GPUs

11 May 2023 - Finite difference method - Laplacian part 3

12 April 2023 - Introduction to profiling tools for AMD hardware

09 March 2023 - AMD Instinct™ MI200 GPU memory space overview

26 January 2023 - AMD ROCm™ installation

04 January 2023 - Finite difference method - Laplacian part 2

14 November 2022 - Finite difference method - Laplacian part 1

14 November 2022 - AMD matrix cores

Posts tagged Hardware

05 March 2025 - Introducing Instella: New State-of-the-art Fully Open 3B Language Models

14 February 2025 - Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1

Posts tagged Installation

06 June 2025 - ROCm Revisited: Getting Started with HIP

22 May 2025 - ROCm Runfile Installer Is Here!

14 April 2025 - Installing ROCm from source with Spack

11 April 2025 - ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver

09 February 2025 - Deep dive into the MI300 compute and memory partition modes

10 September 2024 - Introducing the AMD ROCm™ Offline Installer Creator: Simplifying Deployment for AI and HPC

29 July 2024 - Graph analytics on AMD GPUs using Gunrock

26 April 2024 - Application portability with HIP

11 September 2023 - Creating a PyTorch/TensorFlow code environment on AMD GPUs

08 June 2023 - GPU-aware MPI with ROCm

26 January 2023 - AMD ROCm™ installation

Posts tagged JAX

09 October 2024 - Supercharging JAX with Triton Kernels on AMD GPUs

06 September 2024 - Optimize GPT Training: Enabling Mixed Precision Training in JAX using ROCm on AMD GPUs

22 July 2024 - Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs

02 July 2024 - A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs

25 January 2024 - LLM distributed supervised fine-tuning with JAX

Posts tagged Kubernetes

03 July 2025 - Unlocking GPU-Accelerated Containers with the AMD Container Toolkit

06 June 2025 - ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

28 March 2025 - What’s New in the AMD GPU Operator v1.2.0 Release

13 March 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3

25 February 2025 - Deploying Serverless AI Inference on AMD GPU Clusters

14 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2

07 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1

29 January 2025 - Announcing the AMD GPU Operator and Metrics Exporter

Posts tagged LLM

17 July 2025 - Vibe Coding Pac-Man Inspired Game with DeepSeek-R1 and AMD Instinct MI300X

28 June 2025 - Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm

18 June 2025 - Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation

12 June 2025 - Aligning Mixtral 8x7B with TRL on AMD GPUs

09 June 2025 - LLM Quantization with Quark on AMD GPUs: Accuracy and Performance Evaluation

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

03 June 2025 - High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

30 May 2025 - Scale LLM Inference with Multi-Node Infrastructure

28 April 2025 - Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

27 March 2025 - Accelerating LLM Inference: Up to 3x Speedup on MI300X with Speculative Decoding

24 March 2025 - Speculative Decoding - Deep Dive

23 March 2025 - Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs

14 March 2025 - Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide

14 March 2025 - Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance

13 March 2025 - Optimized ROCm Docker for Distributed AI Training

13 March 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

25 February 2025 - Deploying Serverless AI Inference on AMD GPU Clusters

21 February 2025 - Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU

21 February 2025 - How to Build a vLLM Container for Inference and Benchmarking

19 February 2025 - Fine-tuning Phi-3.5-mini LLM at scale: Harnessing Accelerate and Slurm for multinode training

14 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2

07 February 2025 - AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1

29 January 2025 - Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs

28 January 2025 - Distributed fine-tuning of MPT-30B using Composer on AMD GPUs

08 January 2025 - Triton Inference Server with vLLM on AMD GPUs

10 December 2024 - Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators

13 November 2024 - SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs

13 November 2024 - Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs

01 November 2024 - Distributed Data Parallel Training on AMD GPU with ROCm

24 October 2024 - Torchtune on AMD GPUs How-To Guide: Fine-tuning and Scaling LLMs with Multi-GPU Power

24 October 2024 - CTranslate2: Efficient Inference with Transformer Models on AMD GPUs

23 October 2024 - Inference with Llama 3.2 Vision LLMs on AMD GPUs Using ROCm

09 October 2024 - Supercharging JAX with Triton Kernels on AMD GPUs

03 October 2024 - Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch

23 September 2024 - Fine-tuning Llama 3 with Axolotl using ROCm on AMD GPUs

19 September 2024 - Inferencing and serving with vLLM on AMD GPUs

19 September 2024 - Enhancing vLLM Inference on AMD GPUs

06 September 2024 - Optimize GPT Training: Enabling Mixed Precision Training in JAX using ROCm on AMD GPUs

28 August 2024 - Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission

21 August 2024 - Performing natural language processing tasks with LLMs on ROCm running on AMD GPUs

19 August 2024 - Using AMD GPUs for Enhanced Time Series Forecasting with Transformers

09 August 2024 - Inferencing with Grok-1 on AMD GPUs

29 July 2024 - Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD

11 July 2024 - Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

03 July 2024 - Accelerating models on ROCm using PyTorch TunableOp

02 July 2024 - A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs

28 June 2024 - Mamba on AMD GPUs with ROCm

27 June 2024 - Fine-tuning and Testing Cutting-Edge Speech Models using ROCm on AMD GPUs

31 May 2024 - SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel

15 May 2024 - Accelerating Large Language Models with Flash Attention on AMD GPUs

01 May 2024 - Step-by-Step Guide to Use OpenLLM on AMD GPUs

01 May 2024 - Inferencing with Mixtral 8x22B on AMD GPUs

26 April 2024 - Table Question-Answering with TaPas

26 April 2024 - Multimodal (Visual and Language) understanding with LLaVA-NeXT

24 April 2024 - Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

17 April 2024 - Inferencing with AI2’s OLMo model on AMD GPU

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama Model on a single AMD GPU

15 April 2024 - Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU

04 April 2024 - Using the ChatGLM-6B bilingual language model with AMD GPUs

04 April 2024 - Retrieval Augmented Generation (RAG) using LlamaIndex

04 April 2024 - Building semantic search with SentenceTransformers on AMD

01 April 2024 - Scale AI applications with Ray

15 March 2024 - Large language model inference optimizations on AMD GPUs

05 February 2024 - Using LoRA for efficient fine-tuning: Fundamental principles

01 February 2024 - Fine-tune Llama model with LoRA: Customizing a large language model for question-answering

01 February 2024 - Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

29 January 2024 - Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU

26 January 2024 - Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

26 January 2024 - Accelerating XGBoost with Dask using multiple AMD GPUs

25 January 2024 - LLM distributed supervised fine-tuning with JAX

24 January 2024 - Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs

24 January 2024 - Efficient deployment of large language models with Text Generation Inference on AMD GPUs

Posts tagged Linear Algebra

03 July 2024 - Accelerating models on ROCm using PyTorch TunableOp

31 May 2024 - SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel

03 November 2023 - Sparse matrix vector multiplication - part 1

15 September 2023 - Jacobi Solver with HIP and OpenMP offloading

14 November 2022 - AMD matrix cores

Posts tagged Memory

09 February 2025 - MI300A - Exploring the APU advantage

29 August 2024 - Seismic stencil codes - part 3

29 August 2024 - Seismic stencil codes - part 2

29 August 2024 - Seismic stencil codes - part 1

13 May 2024 - Reading AMD GPU ISA

18 April 2024 - C++17 parallel algorithms and HIPSTDPAR

16 April 2024 - Affinity part 2 - System topology and controlling affinity

16 April 2024 - Affinity part 1 - Affinity, placement, and order

18 July 2023 - Finite difference method - Laplacian part 4

08 June 2023 - GPU-aware MPI with ROCm

17 May 2023 - Register pressure in AMD CDNA™2 GPUs

11 May 2023 - Finite difference method - Laplacian part 3

12 April 2023 - Introduction to profiling tools for AMD hardware

09 March 2023 - AMD Instinct™ MI200 GPU memory space overview

04 January 2023 - Finite difference method - Laplacian part 2

14 November 2022 - Finite difference method - Laplacian part 1

14 November 2022 - AMD matrix cores

Posts tagged Multimodal

07 March 2025 - Instella-VL-1B: First AMD Vision Language Model

26 April 2024 - Multimodal (Visual and Language) understanding with LLaVA-NeXT

16 April 2024 - Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU

Posts tagged OpenMP

09 February 2025 - MI300A - Exploring the APU advantage

16 April 2024 - Affinity part 2 - System topology and controlling affinity

16 April 2024 - Affinity part 1 - Affinity, placement, and order

15 September 2023 - Jacobi Solver with HIP and OpenMP offloading

Posts tagged Optimization

03 July 2025 - Unlocking GPU-Accelerated Containers with the AMD Container Toolkit

26 June 2025 - Performance Profiling on AMD GPUs – Part 1: Foundations

06 June 2025 - The ROCm Revisited Series

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

16 May 2025 - Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang

01 May 2025 - Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools

09 April 2025 - Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

13 March 2025 - Optimized ROCm Docker for Distributed AI Training

02 March 2025 - Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X

28 February 2025 - Measuring Max-Achievable FLOPs – Part 2

03 October 2024 - Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch

17 September 2024 - Getting to Know Your GPU: A Deep Dive into AMD SMI

29 August 2024 - Seismic stencil codes - part 3

29 August 2024 - Seismic stencil codes - part 2

29 July 2024 - Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD

22 July 2024 - Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs

03 July 2024 - Accelerating models on ROCm using PyTorch TunableOp

26 April 2024 - Application portability with HIP

15 September 2023 - Jacobi Solver with HIP and OpenMP offloading

18 July 2023 - Finite difference method - Laplacian part 4

17 May 2023 - Register pressure in AMD CDNA™2 GPUs

11 May 2023 - Finite difference method - Laplacian part 3

04 January 2023 - Finite difference method - Laplacian part 2

14 November 2022 - AMD matrix cores

Posts tagged Partner Applications

10 June 2024 - Stone Ridge Expands Reservoir Simulation Options with AMD Instinct™ Accelerators

16 May 2024 - Siemens taps AMD Instinct™ GPUs to expand high-performance hardware options for Simcenter STAR-CCM+

16 May 2024 - AMD Collaboration with the University of Michigan offers High Performance Open-Source Solutions to the Bioinformatics Community

Posts tagged Performance

03 July 2025 - Unlocking GPU-Accelerated Containers with the AMD Container Toolkit

28 June 2025 - Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm

26 June 2025 - Performance Profiling on AMD GPUs – Part 1: Foundations

10 June 2025 - AMD ROCm: Powering the World’s Fastest Supercomputers

09 June 2025 - LLM Quantization with Quark on AMD GPUs: Accuracy and Performance Evaluation

06 June 2025 - The ROCm Revisited Series

04 June 2025 - Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025 - AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

09 April 2025 - Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel

02 April 2025 - Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025 - AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

13 March 2025 - Optimized ROCm Docker for Distributed AI Training

02 March 2025 - Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X

28 February 2025 - Measuring Max-Achievable FLOPs – Part 2

09 February 2025 - Deep dive into the MI300 compute and memory partition modes

13 November 2024 - Introducing AMD’s Next-Gen Fortran Compiler

03 October 2024 - Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch

17 September 2024 - Getting to Know Your GPU: A Deep Dive into AMD SMI

29 August 2024 - Seismic stencil codes - part 3

29 August 2024 - Seismic stencil codes - part 2

29 August 2024 - Seismic stencil codes - part 1

29 July 2024 - Graph analytics on AMD GPUs using Gunrock

28 June 2024 - Mamba on AMD GPUs with ROCm

18 April 2024 - C++17 parallel algorithms and HIPSTDPAR

16 April 2024 - Affinity part 2 - System topology and controlling affinity

16 April 2024 - Affinity part 1 - Affinity, placement, and order

03 November 2023 - Sparse matrix vector multiplication - part 1

15 September 2023 - Jacobi Solver with HIP and OpenMP offloading

18 July 2023 - Finite difference method - Laplacian part 4

08 June 2023 - GPU-aware MPI with ROCm

11 May 2023 - Finite difference method - Laplacian part 3

04 January 2023 - Finite difference method - Laplacian part 2

14 November 2022 - Finite difference method - Laplacian part 1

Posts tagged Profiling

26 June 2025 - Performance Profiling on AMD GPUs – Part 1: Foundations

11 April 2025 - ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

29 August 2024 - Seismic stencil codes - part 3

29 August 2024 - Seismic stencil codes - part 2

29 August 2024 - Seismic stencil codes - part 1

22 July 2024 - Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs

18 June 2024 - TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs

07 May 2024 - AMD in Action: Unveiling the Power of Application Tracing and Profiling

15 September 2023 - Jacobi Solver with HIP and OpenMP offloading

18 July 2023 - Finite difference method - Laplacian part 4

11 May 2023 - Finite difference method - Laplacian part 3

12 April 2023 - Introduction to profiling tools for AMD hardware

04 January 2023 - Finite difference method - Laplacian part 2

14 November 2022 - Finite difference method - Laplacian part 1

Posts tagged PyTorch

07 May 2025 - DataFrame Acceleration: hipDF and hipDF.pandas on AMD GPUs

06 May 2025 - CuPy and hipDF on AMD: The Basics and Beyond

23 March 2025 - Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs

13 March 2025 - Optimized ROCm Docker for Distributed AI Training

19 February 2025 - Fine-tuning Phi-3.5-mini LLM at scale: Harnessing Accelerate and Slurm for multinode training

09 February 2025 - PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm

31 January 2025 - Enhancing AI Training with AMD ROCm Software

29 January 2025 - Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs

28 January 2025 - Distributed fine-tuning of MPT-30B using Composer on AMD GPUs

24 January 2025 - Vision Mamba on AMD GPU with ROCm

10 December 2024 - Training Transformers and Hybrid models on AMD Instinct MI300X Accelerators

03 December 2024 - Transformer based Encoder-Decoder models for image-captioning on AMD GPUs

13 November 2024 - SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs

13 November 2024 - Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs

01 November 2024 - Distributed Data Parallel Training on AMD GPU with ROCm

24 October 2024 - Torchtune on AMD GPUs How-To Guide: Fine-tuning and Scaling LLMs with Multi-GPU Power

24 October 2024 - CTranslate2: Efficient Inference with Transformer Models on AMD GPUs

15 October 2024 - Speed Up Text Generation with Speculative Sampling on AMD GPUs

03 October 2024 - Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch

23 September 2024 - Fine-tuning Llama 3 with Axolotl using ROCm on AMD GPUs

03 September 2024 - Image Classification with BEiT, MobileNet, and EfficientNet using ROCm on AMD GPUs

19 August 2024 - Using AMD GPUs for Enhanced Time Series Forecasting with Transformers

29 July 2024 - Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD

11 July 2024 - DBRX Instruct on AMD GPUs

11 July 2024 - Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

03 July 2024 - Accelerating models on ROCm using PyTorch TunableOp

02 July 2024 - A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs

28 June 2024 - Deep Learning Recommendation Models on AMD GPUs

27 June 2024 - Fine-tuning and Testing Cutting-Edge Speech Models using ROCm on AMD GPUs

29 May 2024 - Unveiling performance insights with PyTorch Profiler on an AMD GPU

23 May 2024 - Panoptic segmentation and instance segmentation with Detectron2 on AMD GPUs

15 May 2024 - Accelerating Large Language Models with Flash Attention on AMD GPUs

26 April 2024 - Table Question-Answering with TaPas

26 April 2024 - Multimodal (Visual and Language) understanding with LLaVA-NeXT

24 April 2024 - Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

24 April 2024 - Transforming Words into Motion: A Guide to Video Generation with AMD GPU

17 April 2024 - Inferencing with AI2’s OLMo model on AMD GPU

16 April 2024 - Text Summarization with FLAN-T5

16 April 2024 - PyTorch C++ Extension on AMD GPU

16 April 2024 - Program Synthesis with CodeGen

16 April 2024 - Instruction fine-tuning of StarCoder with PEFT on multiple AMD GPUs

11 April 2024 - GPU Unleashed: Training Reinforcement Learning Agents with Stable Baselines3 on an AMD GPU in Gymnasium Environment

09 April 2024 - ResNet for image classification using AMD GPUs

08 April 2024 - Small language models with Phi-2

04 April 2024 - Using the ChatGLM-6B bilingual language model with AMD GPUs

04 April 2024 - Total body segmentation using MONAI Deploy on an AMD GPU

29 March 2024 - Automatic mixed precision in PyTorch using AMD GPUs

12 March 2024 - Building a decoder transformer model on AMD GPU(s)

11 March 2024 - Question-answering Chatbot with LangChain on an AMD GPU

08 March 2024 - Music Generation With MusicGen on an AMD GPU

23 February 2024 - Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs

08 February 2024 - Simplifying deep learning: A guide to PyTorch Lightning

07 February 2024 - Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU

05 February 2024 - Using LoRA for efficient fine-tuning: Fundamental principles

26 January 2024 - Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

24 January 2024 - Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs

11 September 2023 - Creating a PyTorch/TensorFlow code environment on AMD GPUs

Posts tagged Recommendation Systems

30 April 2024 - Training a Neural Collaborative Filtering (NCF) Recommender on an AMD GPU

Posts tagged Reinforcement Learning

12 June 2025 - Aligning Mixtral 8x7B with TRL on AMD GPUs

24 April 2025 - Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration

23 May 2024 - Panoptic segmentation and instance segmentation with Detectron2 on AMD GPUs

11 April 2024 - GPU Unleashed: Training Reinforcement Learning Agents with Stable Baselines3 on an AMD GPU in Gymnasium Environment

09 April 2024 - ResNet for image classification using AMD GPUs

Posts tagged Robotics

14 July 2025 - Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot

Posts tagged Scientific Computing

10 June 2025 - AMD ROCm: Powering the World’s Fastest Supercomputers

20 May 2025 - Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs

14 April 2025 - Installing ROCm from source with Spack

09 February 2025 - Deep dive into the MI300 compute and memory partition modes

29 August 2024 - Seismic stencil codes - part 3

29 August 2024 - Seismic stencil codes - part 2

29 August 2024 - Seismic stencil codes - part 1

29 July 2024 - Graph analytics on AMD GPUs using Gunrock

16 April 2024 - Programming AMD GPUs with Julia

03 November 2023 - Sparse matrix vector multiplication - part 1

15 September 2023 - Jacobi Solver with HIP and OpenMP offloading

18 July 2023 - Finite difference method - Laplacian part 4

11 May 2023 - Finite difference method - Laplacian part 3

04 January 2023 - Finite difference method - Laplacian part 2

14 November 2022 - Finite difference method - Laplacian part 1

Posts tagged Serving

30 May 2025 - Scale LLM Inference with Multi-Node Infrastructure

14 March 2025 - Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide

25 February 2025 - Deploying Serverless AI Inference on AMD GPU Clusters

19 September 2024 - Inferencing and serving with vLLM on AMD GPUs

19 September 2024 - Enhancing vLLM Inference on AMD GPUs

01 May 2024 - Step-by-Step Guide to Use OpenLLM on AMD GPUs

Posts tagged Speech

27 June 2024 - Fine-tuning and Testing Cutting-Edge Speech Models using ROCm on AMD GPUs

16 April 2024 - Speech-to-Text on an AMD GPU with Whisper

Posts tagged System-Tuning

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

25 March 2025 - Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

02 March 2025 - Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X

17 September 2024 - Getting to Know Your GPU: A Deep Dive into AMD SMI

16 April 2024 - Affinity part 2 - System topology and controlling affinity

16 April 2024 - Affinity part 1 - Affinity, placement, and order

Posts tagged Systems

06 June 2025 - ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem

22 May 2025 - ROCm Runfile Installer Is Here!

Posts tagged TensorFlow

30 April 2024 - Training a Neural Collaborative Filtering (NCF) Recommender on an AMD GPU

11 September 2023 - Creating a PyTorch/TensorFlow code environment on AMD GPUs

Posts tagged Time Series

19 August 2024 - Using AMD GPUs for Enhanced Time Series Forecasting with Transformers

Posts tagged Tuning

29 May 2024 - Unveiling performance insights with PyTorch Profiler on an AMD GPU