AMD ROCm™ Blogs
Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs
Learn how to use bitsandbytes’ 8-bit representations techniques, 8-bit optimizer and LLM.int8, to optimize your LLMs training and inference using ROCm...
SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD GPUs
Discover SGLang, a fast serving framework designed for large language and vision-language models on AMD GPUs, supporting efficient runtime and a flexi...
Introducing AMD's Next-Gen Fortran Compiler
In this post we present a brief preview of AMD's [Next-Gen Fortran Compiler](https://github.com/amd/InfinityHub-CI/blob/main/fortran/README.md), our n...
Distributed Data Parallel training on AMD GPU with ROCm
This blog demonstrates how to speed up the training of a ResNet model on the CIFAR-100 classification task using PyTorch DDP on AMD GPUs with ROCm....
CTranslate2: Efficient Inference with Transformer Models on AMD GPUs
Optimizing Transformer models with CTranslate2 for efficient inference on AMD GPUs...
Torchtune on AMD GPUs How-To Guide: Fine-tuning and Scaling LLMs with Multi-GPU Power
Torchtune is a PyTorch library that enables efficient fine-tuning of LLMs. In this blog we use Torchtune to fine-tune the Llama-3.1-8B model for summa...
Inference with Llama 3.2 Vision LLMs on AMD GPUs Using ROCm
Meta's Llama 3.2 Vision models bring multimodal capabilities for vision-text tasks. This blog explores leveraging them on AMD GPUs with ROCm for effic...
Speed Up Text Generation with Speculative Sampling on AMD GPUs
This blog will introduce you to assisted text generation using Speculative Sampling. We briefly explain the principles underlying Speculative Sampling...
Multinode Fine-Tuning of Stable Diffusion XL on AMD GPUs with Hugging Face Accelerate and OCI's Kubernetes Engine (OKE)
This blog demonstrates how to set-up and fine-tune a Stable Diffusion XL (SDXL) model in a multinode Oracle Cloud Infrastructure’s (OCI) Kubernetes En...
Stone Ridge Expands Reservoir Simulation Options with AMD Instinct™ Accelerators
Stone Ridge Technology (SRT) pioneered the use of GPUs for high performance reservoir simulation (HPC) nearly a decade ago with ECHELON...
AMD Collaboration with the University of Michigan!
Long read DNA sequencing technology is revolutionizing genetic diagnostics and precision medicine by helping us discover structural variants and assem...
Explore AMD Collaboration with Siemens on Simcenter STAR-CCM+
Siemens recently announced that its Simcenter STAR-CCM+ multi-physics computational fluid dynamics (CFD) software now supports AMD Instinct™ GPUs...
Enhancing vLLM Inference on AMD GPUs
In this blog, we’ll demonstrate the latest performance enhancements in vLLM inference on AMD Instinct accelerators using ROCm. In a nutshell, vLLM opt...
Supercharging JAX with Triton Kernels on AMD GPUs
In this blog post we guide you through developing a fused dropout activation kernel for matrices in Triton, calling the kernel from JAX, and benchmark...
Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch
This blog demonstrates how to use AMD GPUs to implement and evaluate INT8 quantization, and the derived inference speed-up of Llama family and Mistral...
Getting to Know Your GPU: A Deep Dive into AMD SMI
This post introduces AMD System Management Interface (amd-smi), explaining how you can use it to access your GPU’s performance and status data...
Introducing the AMD ROCm™ Offline Installer Creator: Simplifying Deployment for AI and HPC
Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in ...
TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs.
TensorFlow Profiler measures resource use and performance of models, helping identify bottlenecks for optimization. This blog demonstrates the use of ...