Recent Posts - Page 18#
Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling
Discover ROCprofiler SDK – ROCm’s next-generation, unified, scalable, and high-performance profiling toolkit for AI and HPC workloads on AMD GPUs.
Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling.
Discover ROCprofiler SDK – ROCm’s next-generation, unified, scalable, and high-performance profiling toolkit for AI and HPC workloads on AMD GPUs.
Speculative Decoding - Deep Dive
This blog shows the performance improvement achieved by applying speculative decoding with Llama models on AMD MI300X GPUs, tested across models, input sizes, and datasets.
Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs
Learn how to use MegaBlocks to pre-train GPT2 Mixture of Experts (MoE) model, helping you scale your deep learning models effectiveness on AMD GPUs using ROCm
AITER: AI Tensor Engine For ROCm
We introduce AMD's AI Tensor Engine for ROCm (AITER), our centralized high performance AI operators repository, designed to significantly accelerate AI workloads on AMD GPUs
Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X
Learn how to optimize DeepSeek-R1 on AMD MI300X with SGLang, AITER kernels and hyperparameter tuning for up to 5× throughput and 60% lower latency over Nvidia H200
Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide
AMD is excited to announce the integration of Google’s Gemma 3 models with AMD Instinct™ MI300X GPUs
Analyzing the Impact of Tensor Parallelism Configurations on LLM Inference Performance
This blog analyzes how tensor parallelism impacts TCO and Scale for LLM deployments in production.
Optimized ROCm Docker for Distributed AI Training
AMD updated Docker images incorporate torchtune finetuning, FP8 support, single node performance boost, bug fixes & updated benchmarking for stable, efficient distributed training
AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3
This blog is part 3 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernetes and the AMD GPU Operator on the AMD Instinct platform
AMD Advances Enterprise AI Through OPEA Integration
We announce AMD’s support of Open Platform for Enterprise AI (OPEA), integrating OPEA’s enterprise GenAI framework with AMD’s computing hardware and ROCm software
Instella-VL-1B: First AMD Vision Language Model
We introduce Instella-VL-1B, the first AMD vision language model for image understanding trained on MI300X GPUs, outperforming fully open-source models and matching or exceeding many open-weight counterparts in general multimodal benchmarks and OCR-related tasks.