AI - Software Tools & Optimizations - Page 2

AI - Software Tools & Optimizations - Page 2#

January 13, 2026

Reimagining GPU Allocation in Kubernetes: Introducing the AMD GPU DRA Driver

Explore how the AMD GPU DRA Driver brings declarative, attribute-aware GPU scheduling to Kubernetes — learn how to request and manage GPUs natively

./software-tools-optimization/dra-gpu/README.html

January 08, 2026

Introducing the AMD Network Operator v1.0.0: Simplifying High-Performance Networking for AMD Platforms

Introducing the AMD Network Operator for automating high-performance AI NIC networking in Kubernetes for AI and HPC workloads

./software-tools-optimization/amd-network-operator/README.html

January 02, 2026

Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models

Learn how to optimize multimodal model inference with batch-level data parallelism for vision encoders in vLLM, achieving up to 45% throughput gains on AMD MI300X.

./software-tools-optimization/vllm-dp-vision/README.html

December 19, 2025

Getting Started with AMD AI Workbench: Deploying and Managing AI Workloads

Learn how to deploy and manage AI workloads with AMD AI Workbench, a low-code interface for developers to manage AI inference deployments

./software-tools-optimization/enterprise-ai-workbench/README.html

December 16, 2025

MoE Training Best Practices on AMD GPUs

Learn how to optimize Mixture-of-Experts (MoE) model training on AMD Instinct GPUs with ROCm. Maximize your AI training performance now!

./software-tools-optimization/primus-moe-package/README.html

November 24, 2025

The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism

Learn how to combine TP, DP, PP, and EP for MoE models. Discover proven strategies to maximize performance on your vLLM deployments.

./software-tools-optimization/vllm-moe-guide/README.html

November 12, 2025

Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

Learn how a small-radius expert parallel design with prefill–decode disaggregation enables scalable, fault-isolated LLM inference on AMD Instinct™ MI300X clusters.

./software-tools-optimization/wide-ep-deepseek/README.html

November 04, 2025

Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training

Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

./software-tools-optimization/primus-SaFE/README.html

October 29, 2025

High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

Learn to leverage AMD Quark for efficient MXFP4/MXFP6 quantization on AMD Instinct accelerators with high accuracy retention.

./software-tools-optimization/mxfp4-mxfp6-quantization/README.html

October 20, 2025

ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System

Introduce ROCm Core SDK, and learn to install and build ROCm components easily using TheRock.

./software-tools-optimization/therock/README.html

October 14, 2025

Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More

Gumiho boosts LLM inference with early-token accuracy, blending serial + parallel decoding for speed, accuracy, and ROCm-optimized deployment.

./software-tools-optimization/gumiho/README.html

October 09, 2025

GEMM Tuning within hipBLASLt– Part 2

Learn how to use hipblaslt-bench for offline GEMM tuning in hipBLASLt—benchmark, save, and apply custom-tuned kernels at runtime.

./software-tools-optimization/hipblaslt-offline-tuning-part2/README.html

Prev Page 2 of 5 Next