AI - Applications & Models - Page 6#

GEMM Kernel Optimization For AMD GPUs
Guide to how GEMMs can be tuned for optimal performance of AI models on AMD GPUs

Enhancing AI Training with AMD ROCm Software
AMD's GPU training optimizations deliver peak performance for advanced AI models through ROCm software stack.

Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs
Learn how to optimize large language model inference using vLLM on AMD's MI300X GPUs for enhanced performance and efficiency.

Distributed fine-tuning of MPT-30B using Composer on AMD GPUs
This blog uses Composer, a distributed framework, on AMD GPUs to fine-tune MPT-30B in single node as well as multinode

Vision Mamba on AMD GPU with ROCm
This blog explores Vision Mamba (Vim), an innovative and efficient backbone for vision tasks and evaluate its performance on AMD GPUs with ROCm.

Triton Inference Server with vLLM on AMD GPUs
This blog provides a how-to guide on setting up a Triton Inference Server with vLLM backend powered by AMD GPUs, showcasing robust performance with several LLMs

Transformer based Encoder-Decoder models for image-captioning on AMD GPUs
The blog introduces image captioning and provides hands-on tutorials on three different Transformer-based encoder-decoder image captioning models: ViT-GPT2, BLIP, and Alpha- CLIP, deployed on AMD GPUs using ROCm.

Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs
Learn how to use bitsandbytes’ 8-bit representations techniques, 8-bit optimizer and LLM.int8, to optimize your LLMs training and inference using ROCm on AMD GPUs

Distributed Data Parallel Training on AMD GPU with ROCm
This blog demonstrates how to speed up the training of a ResNet model on the CIFAR-100 classification task using PyTorch DDP on AMD GPUs with ROCm.

Torchtune on AMD GPUs How-To Guide: Fine-tuning and Scaling LLMs with Multi-GPU Power
Torchtune is a PyTorch library that enables efficient fine-tuning of LLMs. In this blog we use Torchtune to fine-tune the Llama-3.1-8B model for summarization tasks using LoRA and showcasing scalable training across multiple GPUs.

CTranslate2: Efficient Inference with Transformer Models on AMD GPUs
Optimizing Transformer models with CTranslate2 for efficient inference on AMD GPUs

Inference with Llama 3.2 Vision LLMs on AMD GPUs Using ROCm
Meta's Llama 3.2 Vision models bring multimodal capabilities for vision-text tasks. This blog explores leveraging them on AMD GPUs with ROCm for efficient AI workflows.