AI Blogs#
DGL in Depth: SE(3)-Transformer on ROCm 7
Inform the AI community about running SE(3)-Transformer with DGL on AMD Instinct platforms.
Modernizing Taichi Lang to LLVM 20 for MI325X GPU Acceleration
Power your next AI application or graphics simulation with high-performance GPU/CPU computing in Python with Taichi Lang.
HPC Coding Agent - Part 1: Combining GLM-powered Cline and RAG Using MCP
Build an HPC RAG agent on AMD Instinct GPUs using GLM-4.6, Cline and ChromaDB.
Týr-the-Pruner: Search-based Global Structural Pruning for LLMs
This blog introduces Týr-the-Pruner, a search-based, end-to-end framework for global structural pruning of large language models (LLMs).
Democratizing AI Compute with AMD Using SkyPilot
Learn how SkyPilot integrates with AMD open AI stack to enable seamless multi-cloud deployment and simplifies NVIDIA-to-AMD GPU migration.
Continuing the Momentum: Refining ROCm For The Next Wave Of AI and HPC
ROCm 7.1 builds on 7.0’s AI and HPC advances with faster performance, stronger reliability, and streamlined tools for developers and system builders.
ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity
Discover how ROCm 7.0 integrates AI across every layer, combining hardware enablement, frameworks, model support, and a suite of optimized tools
Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration
performance optimizations for llama.cpp on AMD Instinct GPUs
Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance
Explore ROCm 7.0’s AI training boost! See how MI355X accelerates JAX and PyTorch frameworks to unlock faster and efficient LLM scaling.
VLM Fine-Tuning for Robotics on AMD Enterprise AI Suite
Fine-tune OpenCLIP with Bridge Data V2 on ROCm to enable robotics related fine-tuning
Fine-Tune LLMs for Proteins with AMD Enterprise AI Suite
Fine-tune Llama 3.1 8B with ROCm for advanced protein sequence insights in bioinformatics
Exploring Gameplay Video Generation with Hunyuan-GameCraft
Learn to generate dynamic, action-controllable gameplay videos from single images using Hunyuan-GameCraft on AMD Instinct MI300X GPUs with ROCm.
The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism
Learn how to combine TP, DP, PP, and EP for MoE models. Discover proven strategies to maximize performance on your vLLM deployments.
Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X
Learn how a small-radius expert parallel design with prefill–decode disaggregation enables scalable, fault-isolated LLM inference on AMD Instinct™ MI300X clusters.
Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training
Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.
High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs
Learn to leverage AMD Quark for efficient MXFP4/MXFP6 quantization on AMD Instinct accelerators with high accuracy retention.
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics