Recent Posts#
Accelerating llama.cpp on AMD Instinct MI300X
Learn more about the superior performance of llama.cpp on Instinct platforms.
Medical Imaging on MI300X: SwinUNETR Inference Optimization
A practical guide to optimizing SwinUNETR inference on AMD Instinct™ MI300X GPUs for fast 3D segmentation of tumors in medical imaging.
Accelerating Autonomous Driving Model Training on AMD ROCm™ Software
Learn how to deploy AMD GPUs for high-performance autonomous driving related model training with ROCm optimization.
Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs
Explore how MI355X performs against B200 in vLLM benchmarks across DeepSeek-R1, GPT-OSS-120B, Qwen3-235B and Llama-3.3-70B.
Building a State-of-the-Art 32 Billion Reasoning Model with Only Synthetic Data on AMD GPUs
Learn how to build a State-of-the-art reasoning model that beats Qwen3-32B using only synthetic data and SFT on AMD Instinct™ GPUs—fast, simple, and scalable.
DGL in Depth: SE(3)-Transformer on ROCm 7
Inform the AI community about running SE(3)-Transformer with DGL on AMD Instinct platforms.
Modernizing Taichi Lang to LLVM 20 for MI355X GPU Acceleration
Power your next AI application or graphics simulation with high-performance GPU/CPU computing in Python with Taichi Lang.
HPC Coding Agent - Part 1: Combining GLM-powered Cline and RAG Using MCP
Build an HPC RAG agent on AMD Instinct GPUs using GLM-4.6, Cline and ChromaDB.
Týr-the-Pruner: Search-based Global Structural Pruning for LLMs
This blog introduces Týr-the-Pruner, a search-based, end-to-end framework for global structural pruning of large language models (LLMs).
Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance
Explore ROCm 7.0’s AI training boost! See how MI355X accelerates JAX and PyTorch frameworks to unlock faster and efficient LLM scaling.
VLM Fine-Tuning for Robotics on AMD Enterprise AI Suite
Fine-tune OpenCLIP with Bridge Data V2 on ROCm to enable robotics related fine-tuning
Fine-Tune LLMs for Proteins with AMD Enterprise AI Suite
Fine-tune Llama 3.1 8B with ROCm for advanced protein sequence insights in bioinformatics