Recent Posts - Page 4#
ROCm 7.2: Smarter, Faster, and More Scalable for Modern AI Workloads
we highlight the latest ROCm 7.2 enhancements for AMD Instinct GPUs, designed to boost AI and HPC performance
Nitro-AR: A Compact AR Transformer for High-Quality Image Generation
Nitro-AR is a compact E-MMDiT-based masked AR image generator matching diffusion quality with lower latency on AMD GPUs.
ROCm Becomes a First-Class Platform in the vLLM Ecosystem
ROCm is now a first-class vLLM platform: official wheels + Docker, stronger CI, and faster LLM & multimodal inference on AMD Instinct GPUs.
Quickly Developing Powerful Flash Attention Using TileLang on AMD Instinct MI300X GPU
Learn how to leverage TileLang to develop your own kernel. Explore the power to fully utilize AMD GPUs
Deep Dive into Primus: High-Performance Training for Large Language Models
Learn how to achieve peak dense LLM training performance on AMD Instinct™ GPUs using Primus’s unified CLI and optimized backend presets.
Applying Compute Partitioning for Workloads on MI300X GPUs
Learn how to boost MI300X performance using GPU Compute partitioning for parallel workloads like GROMACS and REINVENT
Reimagining GPU Allocation in Kubernetes: Introducing the AMD GPU DRA Driver
Explore how the AMD GPU DRA Driver brings declarative, attribute-aware GPU scheduling to Kubernetes — learn how to request and manage GPUs natively
Installing AMD HIP-Enabled GROMACS on HPC Systems: A LUMI Supercomputer Case Study
Installing AMD HIP-Enabled GROMACS on HPC Systems: A LUMI Supercomputer Case Study
Athena-PRM: Enhancing Multimodal Reasoning with Data-Efficient Process Reward Models
Learn how to utilize a data-efficient Process Reward Model to enhance the reasoning ability of the Large Language/Multimodal Models.
Using Gradient Boosting Libraries on MI300X for Financial Risk Prediction
This blog shows how to run LightGBM and ThunderGBM GPU-accelerated training on AMD Instinct MI300X GPUs with ROCm for finance focused workloads.
Introducing the AMD Network Operator v1.0.0: Simplifying High-Performance Networking for AMD Platforms
Introducing the AMD Network Operator for automating high-performance AI NIC networking in Kubernetes for AI and HPC workloads
Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms
Learn how to use Hummingbird-XT and Hummingbird-XTX modelS to generate videos. Explore the video diffusion model acceleration solution, including dit distillation method and light VAE model.