Applications & models - Page 8#
Explore the latest blogs about applications and models in the ROCm ecosystem, including machine learning frameworks, AI models, and application case studies.
Multinode Fine-Tuning of Stable Diffusion XL on AMD GPUs with Hugging Face Accelerate and OCI's Kubernetes Engine (OKE)
This blog demonstrates how to set-up and fine-tune a Stable Diffusion XL (SDXL) model in a multinode Oracle Cloud Infrastructure’s (OCI) Kubernetes Engine (OKE) on a cluster of AMD GPUs using ROCm
Supercharging JAX with Triton Kernels on AMD GPUs
In this blog post we guide you through developing a fused dropout activation kernel for matrices in Triton, calling the kernel from JAX, and benchmarking its performance.
Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch
This blog demonstrates how to use AMD GPUs to implement and evaluate INT8 quantization, and the derived inference speed-up of Llama family and Mistral LLM models.
Fine-tuning Llama 3 with Axolotl using ROCm on AMD GPUs
This blog demonstrates how to fine-tune Llama 3 with Axolotl using ROCm on AMD GPUs, and how to evaluate the performance of your LLM before and after fine-tuning.
Inferencing and serving with vLLM on AMD GPUs
Learn step-by-step how to leverage vLLM for high-performance inferencing and model serving on AMD GPUs
Enhancing vLLM Inference on AMD GPUs
Showcases the latest performance enhancements in vLLM inference on AMD Instinct accelerators using ROCm 6.2, including FP8 KV-Cache, quantization, and GEMM tuning
Optimize GPT Training: Enabling Mixed Precision Training in JAX using ROCm on AMD GPUs
Guide to modify our JAX-based nanoGPT model for mixed-precision training, optimizing speed and efficiency on AMD GPUs with ROCm.
Image Classification with BEiT, MobileNet, and EfficientNet using ROCm on AMD GPUs
Image Classification with BEiT, MobileNet, and EfficientNet on AMD GPU
Seismic stencil codes - part 1
Seismic Stencil Codes - Part 1: Seismic workloads in the HPC space have a long history of being powered by high-order finite difference methods on structured grids. This trend continues to this day.
Seismic stencil codes - part 2
Seismic Stencil Codes - Part 2: In the previous post, recall that the kernel with stencil computation in the z-direction suffered from low effective bandwidth. This low performance comes from generating substantial amounts of data to movement to global memory.
Seismic stencil codes - part 3
Seismic Stencil Codes - Part 3: In the last two blog posts, we developed a HIP kernel capable of computing high order finite differences commonly needed in seismic wave propagation.
Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission
Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission