Recent Posts - Page 18#

September 05, 2025

GEMM Tuning within hipBLASLt - Part 1

We introduce a hipBLASLt tuning tool that lets developers optimize GEMM problem sizes and integrate them into the library.

./software-tools-optimization/hipblaslt-offline-tuning-part1/README.html

September 04, 2025

Step-3 Deployment Simplified: A Day 0 Developer’s Guide on AMD Instinct™ GPUs

Learn how to deploy Step-3, a 321B-parameter VLM with MFA & AFD, on AMD Instinct™ GPUs to cut decoding costs and boost long-context reasoning

./artificial-intelligence/step3-model/README.html

August 28, 2025

Unleashing AMD Instinct™ MI300X GPUs for LLM Serving: Disaggregating Prefill & Decode with SGLang

Learn how prefill–decode disaggregation improves LLM inference by reducing latency, enhancing throughput, and optimizing resource usage.

./software-tools-optimization/disaggregation/README.html

August 26, 2025

QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

Quick Reduce speeds up LLM inference on AMD Instinct™ MI300X GPUs with inline-compressed all-reduce, cutting comms overhead by up to 3×

./artificial-intelligence/quick-reduce/README.html

August 25, 2025

AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs

AITER boosts DeepSeek-V3’s MLA on AMD MI300X GPUs with low-rank projections, shared KV paths & matrix absorption for 2× faster inference.

./software-tools-optimization/aiter-mla/README.html

August 22, 2025

Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

./software-tools-optimization/primus/README.html

August 22, 2025

Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning

A novel approach that replaces visual tokens with perception-conditioned weights, reducing compute while maintaining strong vision-language performance.

./artificial-intelligence/elvm,-vlms,-llm,/README.html

August 20, 2025

DGL in the Real World: Running GNNs on Real Use Cases

We walk through four advanced GNN workloads from heterogeneous e-commerce graphs to neuroscience applications that we successfully ran using our DGL implementation.

./artificial-intelligence/dgl_blog2/README.html