Applications & Models#
Explore the latest blogs about applications and models in the ROCm ecosystem, including machine learning frameworks, AI models, and application case studies.

Instella-VL-1B: First AMD Vision Language Model
We introduce Instella-VL-1B, the first AMD vision language model for image understanding trained on MI300X GPUs, outperforming fully open-source models and matching or exceeding many open-weight counterparts in general multimodal benchmarks and OCR-related tasks.
Introducing Instella: New State-of-the-art Fully Open 3B Language Models
AMD is excited to announce Instella, a family of fully open state-of-the-art 3-billion-parameter language models (LMs). , In this blog we explain how the Instella models were trained, and how to access them.

Deploying Serverless AI Inference on AMD GPU Clusters
This blog helps targeted audience in setting up AI inference serverless deployment in a kubernetes cluster with AMD accelerators. Blog aims to provide a comprehensive guide for deploying and scaling AI inference workloads on serverless infrastructre.

Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU
This blog introduces the key performance optimizations made to enable DeepSeek-R1 Inference

Fine-tuning Phi-3.5-mini LLM at scale: Harnessing Accelerate and Slurm for multinode training
Fine-tuning Phi-3.5-mini-instruct LLM using multinode distributed training with Hugging Face Accelerate, Slurm, and Docker for scalable efficiency.

Navigating vLLM Inference with ROCm and Kubernetes
Quick introduction to Kubernetes (K8s) and a step-by-step guide on how to use K8s to deploy vLLM using ROCm.

PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm
This blog guides you through the process of using PyTorch FSDP to fine-tune LLMs efficiently on AMD GPUs.

GEMM Kernel Optimization For AMD GPUs
Guide to how GEMMs can be tuned for optimal performance of AI models on AMD GPUs

Enhancing AI Training with AMD ROCm Software
AMD's GPU training optimizations deliver peak performance for advanced AI models through ROCm software stack.

Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs
Learn how to optimize large language model inference using vLLM on AMD's MI300X GPUs for enhanced performance and efficiency.

Distributed fine-tuning of MPT-30B using Composer on AMD GPUs
This blog uses Composer, a distributed framework, on AMD GPUs to fine-tune MPT-30B in single node as well as multinode

Vision Mamba on AMD GPU with ROCm
This blog explores Vision Mamba (Vim), an innovative and efficient backbone for vision tasks and evaluate its performance on AMD GPUs with ROCm.