Recent Posts - Page 5#

May 07, 2026

vLLM-ATOM: Unlocking Native AMD Performance in the vLLM Ecosystem

Use ATOM as an out-of-tree vLLM plugin to keep vLLM compatibility while enabling AMD-optimized attention, model execution, and multi-model support including Kimi-K2.5.

./software-tools-optimization/vllm-atom/README.html

May 07, 2026

AMD-Powered 3D Gaussian Splatting for Autonomous Driving Scenes

Run Street Gaussians on AMD Instinct MI300: migrate to latest gsplat, install on ROCm, and render dynamic street scenes.

./artificial-intelligence/street-gaussians/README.html

May 05, 2026

Accelerating Mixture-of-Experts Execution with FarSkip-Collective Models

Explore a new MoE architecture designed for native computation-communication overlap, enabling efficient distributed execution.

./artificial-intelligence/farskip-collective-moe/README.html

April 27, 2026

TraceLens: Democratizing AI Performance Analysis

Explore how TraceLens automates profiler trace analysis to pinpoint bottlenecks and optimize AI workloads.

./software-tools-optimization/tracelens/README.html

April 24, 2026

Primus Projection: Estimate Memory and Performance Before You Train

Learn how to use the Primus projection tool to estimate memory and performance for large-scale LLM training on AMD Instinct™ accelerator platforms.

./software-tools-optimization/primus-projection/README.html

April 24, 2026

Styled Text Image Generation with Eruku on AMD

Hands-on, reproducible guide to train and run Eruku on LUMI supercomputer, powered by AMD Instinct MI250X GPUs.

./ecosystems-and-partners/eruku-genai/README.html

April 20, 2026

Getting Started with FlyDSL Nightly Wheels on ROCm

A practical guide to installing and using FlyDSL nightly wheels on ROCm for fast, Python-native GPU kernel development

./software-tools-optimization/flydsl-nightly-wheel/README.html

April 20, 2026

FLy: A New Paradigm for Speculative Decoding — Accepting Semantically Correct Drafts Beyond Exact Match

This blog explores a new training-free loosely speculative decoding method, that can accept mismatches that are semantically valid and speedup original SPD method.

./artificial-intelligence/fly/README.html

April 10, 2026

Introduction to profiling tools for AMD hardware

Profiling tools

./software-tools-optimization/profilers/README.html

April 07, 2026

Serving CTR Recommendation Models with Triton Inference Server using the ONNX Runtime Backend

Learn how to deploy AI models on AMD GPUs with Triton Inference Server, now supporting ONNX Runtime and Python backends, and see performance benchmarks.

./artificial-intelligence/triton-inference-server/README.html

April 06, 2026

FlashInfer on ROCm: High‑Throughput Prefill Attention via AITER

FlashInfer is an open-source library for accelerating LLM serving that is now supported by ROCm.

./artificial-intelligence/flashinfer-release2/README.html

April 06, 2026

Customizing Kernels with hipBLASLt TensileLite GEMM Tuning - Advanced User Guide

Master hipBLASLt TensileLite Tuning. Learn to build custom kernels that deliver 150%-250% faster GEMM performance on AMD Instinct™ MI300X GPUs

./artificial-intelligence/hipblaslt-tensilelite-tuning/README.html

Prev Page 5 of 35 Next