Contents

AMD ROCm™ Blogs

Recent Posts

Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs
SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD GPUs
Introducing AMD’s Next-Gen Fortran Compiler
Distributed Data Parallel Training on AMD GPU with ROCm
CTranslate2: Efficient Inference with Transformer Models on AMD GPUs
Torchtune on AMD GPUs How-To Guide: Fine-tuning and Scaling LLMs with Multi-GPU Power
Inference with Llama 3.2 Vision LLMs on AMD GPUs Using ROCm
Speed Up Text Generation with Speculative Sampling on AMD GPUs
Multinode Fine-Tuning of Stable Diffusion XL on AMD GPUs with Hugging Face Accelerate and OCI’s Kubernetes Engine (OKE)

Ecosystems and partners

Stone Ridge Expands Reservoir Simulation Options with AMD Instinct™ Accelerators
AMD Collaboration with the University of Michigan offers High Performance Open-Source Solutions to the Bioinformatics Community
Siemens taps AMD Instinct™ GPUs to expand high-performance hardware options for Simcenter STAR-CCM+

Applications & models

Enhancing vLLM Inference on AMD GPUs
Supercharging JAX with Triton Kernels on AMD GPUs
Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch

Software tools & optimizations

Getting to Know Your GPU: A Deep Dive into AMD SMI
Introducing the AMD ROCm™ Offline Installer Creator: Simplifying Deployment for AI and HPC
TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs

Stay informed