Applications & models - Page 8

Applications & models - Page 8#

Explore the latest blogs about applications and models in the ROCm ecosystem, including machine learning frameworks, AI models, and application case studies.

Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs

Learn how to use MegaBlocks to pre-train GPT2 Mixture of Experts (MoE) model, helping you scale your deep learning models effectiveness on AMD GPUs using ROCm

March 23, 2025 by Fabricio Flores, Rishi Madduri, Yao Liu, Phani Vaddadi, Vish Vadlamani

Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide

AMD is excited to announce the integration of Google’s Gemma 3 models with AMD Instinct™ MI300X GPUs

March 14, 2025 by Shekhar Pandey, Anshul Gupta

Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance

This blog analyzes how tensor parallelism impacts TCO and Scale for LLM deployments in production.

March 14, 2025 by Eduardo Alvarez

Instella-VL-1B: First AMD Vision Language Model

We introduce Instella-VL-1B, the first AMD vision language model for image understanding trained on MI300X GPUs, outperforming fully open-source models and matching or exceeding many open-weight counterparts in general multimodal benchmarks and OCR-related tasks.

March 07, 2025 by Ximeng Sun, Aditya Kumar Singh, Gowtham Ramesh, Zicheng Liu, Pratik Prabhanjan Brahma, Ze Wang, Jiang Liu, Jialian Wu, Prakamya Mishra, Xiaodong Yu, Yusheng Su, Sudhanshu Ranjan, Emad Barsoum

Introducing Instella: New State-of-the-art Fully Open 3B Language Models

AMD is excited to announce Instella, a family of fully open state-of-the-art 3-billion-parameter language models (LMs). , In this blog we explain how the Instella models were trained, and how to access them.

March 05, 2025 by Jiang Liu, Jialian Wu, Xiaodong Yu, Prakamya Mishra, Sudhanshu Ranjan, Zicheng Liu, Chaitanya Manem, Yusheng Su, Pratik Prabhanjan Brahma, Gowtham Ramesh, Ximeng Sun, Ze Wang, Emad Barsoum

Deploying Serverless AI Inference on AMD GPU Clusters

This blog helps targeted audience in setting up AI inference serverless deployment in a kubernetes cluster with AMD accelerators. Blog aims to provide a comprehensive guide for deploying and scaling AI inference workloads on serverless infrastructre.

February 25, 2025 by Rathnakara Malatesha

Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU

This blog introduces the key performance optimizations made to enable DeepSeek-R1 Inference

February 21, 2025 by Andy Luo

Fine-tuning Phi-3.5-mini LLM at scale: Harnessing Accelerate and Slurm for multinode training

Fine-tuning Phi-3.5-mini-instruct LLM using multinode distributed training with Hugging Face Accelerate, Slurm, and Docker for scalable efficiency.

February 19, 2025 by Fabricio Flores

Navigating vLLM Inference with ROCm and Kubernetes

Quick introduction to Kubernetes (K8s) and a step-by-step guide on how to use K8s to deploy vLLM using ROCm.

February 13, 2025 by Alex He

PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm

This blog guides you through the process of using PyTorch FSDP to fine-tune LLMs efficiently on AMD GPUs.

February 09, 2025 by Sean Song

GEMM Kernel Optimization For AMD GPUs

Guide to how GEMMs can be tuned for optimal performance of AI models on AMD GPUs

February 06, 2025 by Ning Zhang, George Wang, Anshul Gupta

Enhancing AI Training with AMD ROCm Software

AMD's GPU training optimizations deliver peak performance for advanced AI models through ROCm software stack.

January 31, 2025 by Emad Barsoum

Prev Page 8 of 16 Next