Recent Posts - Page 2#
ORBIT-2 based Weather and Climate Downscaling and Downscaled Global Forecasts on AMD Instinct
A showcase for how to run GenCast’s weather prediction with ORBIT-2’s high-resolution downscaling on AMD Instinct hardware.
Adapting AIM LLMs For Specific Use Cases Through Fine-Tuning in AMD AI Workbench
Learn how to adapt and fine-tune an AIM LLM in AMD AI Workbench GUI for specialization or specific use cases.
Performance Profiling on AMD GPUs - Part 4: Fortran OpenMP Offload Edition
Guides developers through profiling and optimizing Fortran OpenMP GPU offload applications using ROCm tools
Out-of-the-Box ROLL Support on AMD GPUs: Accelerating Reinforcement Learning at Scale
Learn how to run Alibaba's ROLL RL framework out-of-the-box on AMD Instinct™ GPUs with ROCm
Running Variational Quantum Eigensolver with Qiskit Aer on AMD Instinct
A step-by-step guide to running GPU-accelerated VQE for quantum chemistry with Qiskit Aer on AMD Instinct using ROCm.
Enabling Speculative Speculative Decoding on MI300X
This is an introduction of speculative speculative decoding method. We enable this method on the AMD Instinct MI300x GPUs and report the results.
Deep Dive Into 4-Wave Interleave FP8 GEMM
Learn how to build faster FP8 GEMM kernels on AMD CDNA™4 using 4-wave interleaving to hide memory latency and maximize Matrix Core utilization.
AI Inference on AMD Ryzen™ AI Max Processor
Hands-on: run Qwen3.5 9B–122B on Ryzen™ AI Max+ with 128GB UMA and Ollama, with generation benchmarks and a clear UMA setup path on Ubuntu/ROCm.
From Build to Benchmark: ONNX Model Serving with Triton Inference Server on AMD GPUs
Step-by-step guide to building, deploying, and benchmarking ONNX models with Triton Inference Server and MIGraphX on AMD GPUs
From Naive to Near-Peak: Building High-Performance GEMM Kernels with Gluon
Learn how a Gluon GEMM tutorial teaches profiling-driven AMD GPU optimization from FP16 baseline to BF8 and MXFP4 kernels.
Diffusion-based Atmospheric Downscaling on AMD Instinct GPUs
Read this blog post to learn about and understand the theory of downscaling models. Also learn how to run a particular model, CorrDiff, on AMD GPUs.
QuickReduce FP4 Quantization and Benchmarking on MI355
Learn how QuickReduce uses FP4 quantization to accelerate all-reduce communication and evaluate its performance on AMD Instinct MI355 GPUs.