Applications & models - Page 4

Applications & models - Page 4#

Explore the latest blogs about applications and models in the ROCm ecosystem, including machine learning frameworks, AI models, and application case studies.

Fine-tuning Phi-3.5-mini LLM at scale: Harnessing Accelerate and Slurm for multinode training

Fine-tuning Phi-3.5-mini-instruct LLM using multinode distributed training with Hugging Face Accelerate, Slurm, and Docker for scalable efficiency.

February 19, 2025 by Fabricio Flores

Navigating vLLM Inference with ROCm and Kubernetes

Quick introduction to Kubernetes (K8s) and a step-by-step guide on how to use K8s to deploy vLLM using ROCm.

February 13, 2025 by Alex He

PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm

This blog guides you through the process of using PyTorch FSDP to fine-tune LLMs efficiently on AMD GPUs.

February 09, 2025 by Sean Song

GEMM Kernel Optimization For AMD GPUs

Guide to how GEMMs can be tuned for optimal performance of AI models on AMD GPUs

February 06, 2025 by Ning Zhang, George Wang, Anshul Gupta

Enhancing AI Training with AMD ROCm Software

AMD's GPU training optimizations deliver peak performance for advanced AI models through ROCm software stack.

January 31, 2025 by Emad Barsoum

Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs

Learn how to optimize large language model inference using vLLM on AMD's MI300X GPUs for enhanced performance and efficiency.

January 29, 2025 by Andy Luo

Distributed fine-tuning of MPT-30B using Composer on AMD GPUs

This blog uses Composer, a distributed framework, on AMD GPUs to fine-tune MPT-30B in single node as well as multinode

January 28, 2025 by Vara Lakshmi Bayanagari

Vision Mamba on AMD GPU with ROCm

This blog explores Vision Mamba (Vim), an innovative and efficient backbone for vision tasks and evaluate its performance on AMD GPUs with ROCm.

January 24, 2025 by Sean Song

Triton Inference Server with vLLM on AMD GPUs

This blog provides a how-to guide on setting up a Triton Inference Server with vLLM backend powered by AMD GPUs, showcasing robust performance with several LLMs

January 08, 2025 by Fabricio Flores, Tiffany Mintz, Eliot Li, Yao Liu, Ted Themistokleous, Brian Pickrell, Vish Vadlamani

Transformer based Encoder-Decoder models for image-captioning on AMD GPUs

The blog introduces image captioning and provides hands-on tutorials on three different Transformer-based encoder-decoder image captioning models: ViT-GPT2, BLIP, and Alpha- CLIP, deployed on AMD GPUs using ROCm.

December 03, 2024 by Vara Lakshmi Bayanagari

Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs

Learn how to use bitsandbytes’ 8-bit representations techniques, 8-bit optimizer and LLM.int8, to optimize your LLMs training and inference using ROCm on AMD GPUs

November 13, 2024 by Vara Lakshmi Bayanagari

Distributed Data Parallel Training on AMD GPU with ROCm

This blog demonstrates how to speed up the training of a ResNet model on the CIFAR-100 classification task using PyTorch DDP on AMD GPUs with ROCm.

November 01, 2024 by Sean Song

Prev Page 4 of 11 Next