AI Blogs - Page 5#
Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework
This blog shows how CK-Tile’s XOR-based swizzle optimizes shared memory access in GEMM kernels on AMD GPUs by eliminating LDS bank conflicts
Benchmarking Reasoning Models: From Tokens to Answers
Learn how to benchmark reasoning tasks. Use Qwen3 and vLLM to test true reasoning performance, not just how fast words are generated.
Chain-of-Thought Guided Visual Reasoning Using Llama 3.2 on a Single AMD Instinct MI300X GPU
Fine-tune Llama 3.2 Vision models on AMD MI300X GPU using Torchtune, achieving 2.3× better accuracy with 11B vs 90B model on chart-based tasks.
Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs
Explore Instella-T2I: AMD’s open-source text-to-image model, built on MI300X GPUs with novel tokenizer and LLM-based encoder for scalable image generation.
Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot
Speed up robotics AI with AMD ROCm and LeRobot: fine-tune VLAs on Instinct GPUs and deploy on Ryzen AI. Follow the tutorial to get started.
Accelerating Video Generation on ROCm with Unified Sequence Parallelism: A Practical Guide
A practical guide for accelerating video generation with HunyuanVideo and Wan 2.1 using Unified Sequence Parallelism on AMD GPUs.
Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day
Nitro-T is a family of text-to-image diffusion models developed by AMD to demonstrate efficient large-scale training on Instinct™ MI300X GPUs. Trained from scratch in under 24 hours
vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance
vLLM v1 on AMD ROCm boosts LLM serving with faster TTFT, higher throughput, and optimized multimodal support—ready out of the box.
Unlocking GPU-Accelerated Containers with the AMD Container Toolkit
Simplify GPU acceleration in containers with the AMD Container Toolkit—streamlined setup, runtime hooks, and full ROCm integration.
Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm
vLLM v0.9.x is here with major ROCm™ optimizations—boosting LLM performance, reducing latency, and expanding model support on AMD Instinct™ GPUs.
Enabling Real-Time Context for LLMs: Model Context Protocol (MCP) on AMD GPUs
Learn how to leverage Model Context Protocol (MCP) servers to provide real time context information to LLMs through a chatbot example on AMD GPUs
Continued Pretraining: A Practical Playbook for Language-Specific LLM Adaptation
A step by step guide to adapting LLMs to new languages via continued pretraining, with Poro 2 boosting Finnish performance using Llama 3.1 and AMD GPUs