AI Blogs - Page 13

AI Blogs - Page 13#

September 17, 2025

AMD-HybridLM: Towards Extremely Efficient Hybrid Language Models

Explore AMD-HybridLM’s architecture and see how hybridization redefines LLM efficiency and performance without requiring retraining from scratch

./artificial-intelligence/hybrid-models,-mla,/README.html

September 16, 2025

ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity

Discover how ROCm 7.0 integrates AI across every layer, combining hardware enablement, frameworks, model support, and a suite of optimized tools

./ecosystems-and-partners/rocm-7.0-blog/README.html

September 11, 2025

Efficient LLM Serving with MTP: DeepSeek V3 and SGLang on AMD Instinct GPUs

This blog will show you how to speed up LLM inference with Multi-Token Prediction in DeepSeek V3 & SGLang on AMD Instinct GPUs

./software-tools-optimization/mtp/README.html

September 10, 2025

Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows

Ray, combined with ROCm, provides a powerful platform for scaling AI applications, particularly for training and inference workloads.

./artificial-intelligence/rocm-ray/README.html

September 09, 2025

Technical Dive into AMD's MLPerf Inference v5.1 Submission

In this blog, we share the technical details of how we accomplish the results in our MLPerf Inference v5.1 submission.

./artificial-intelligence/mlperf-inference-v5.1/README.html

September 09, 2025

Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission

In this blog, we will provide step by step instruction on how to reproduce AMD's MLPerf Inference v5.1 Submission

./artificial-intelligence/mlperf-inference5.1-repro/README.html

September 09, 2025

Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance

This blog describes the technical details of how we prune and fine tune the Llama 3.1 405B model in our MLPerf Inference v5.1 submission.

./artificial-intelligence/mlperf-llama-pruning/README.html

September 09, 2025

Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration

performance optimizations for llama.cpp on AMD Instinct GPUs

./ecosystems-and-partners/llama-cpp/README.html

September 05, 2025

GEMM Tuning within hipBLASLt - Part 1

We introduce a hipBLASLt tuning tool that lets developers optimize GEMM problem sizes and integrate them into the library.

./software-tools-optimization/hipblaslt-offline-tuning-part1/README.html

September 04, 2025

Step-3 Deployment Simplified: A Day 0 Developer’s Guide on AMD Instinct™ GPUs

Learn how to deploy Step-3, a 321B-parameter VLM with MFA & AFD, on AMD Instinct™ GPUs to cut decoding costs and boost long-context reasoning

./artificial-intelligence/step3-model/README.html

August 28, 2025

Unleashing AMD Instinct™ MI300X GPUs for LLM Serving: Disaggregating Prefill & Decode with SGLang

Learn how prefill–decode disaggregation improves LLM inference by reducing latency, enhancing throughput, and optimizing resource usage.

./software-tools-optimization/disaggregation/README.html

August 26, 2025

QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

Quick Reduce speeds up LLM inference on AMD Instinct™ MI300X GPUs with inline-compressed all-reduce, cutting comms overhead by up to 3×

./artificial-intelligence/quick-reduce/README.html

Prev Page 13 of 27 Next