Recent Posts - Page 4#
PyTorch Offline Tuning with TunableOp
Learn how to accelerate PyTorch workloads with TunableOp offline tuning—record, tune separately, and deploy faster inference.
LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models
Learn how task-specific synthetic data can improve small language model performance and explore results from the LuminaSFT study.
Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation
Learn how to use our flexible and scalable pipeline parallelism framework with Primus backend and AMD hardware.
FlyDSL: Expert GPU Kernel Development with the Ease of MLIR Python Native DSL on AMD GPUs
FlyDSL is a Python-first, MLIR-native DSL for expert GPU kernel development and tuning on AMD GPUs.
Introducing hipThreads: A C++ - Style Concurrency Library for AMD GPUs
Discover how hipThreads lets you write hip::thread just like std::thread and unlock GPU acceleration with minimal code changes.
Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression
Showcase advanced algorithms available in AMD Quark for efficient MXFP4 quantization on AMD Instinct accelerators with high accuracy retention.
Adaptive Top-K Selection: Eliminating Performance Cliffs Across All K Values on AMD GPUs
Explore adaptive Top-K on MI300X! See how auto-selection and hardware optimizations like DPP and double buffering drive peak efficiency.
Unlocking Sparse Acceleration on AMD GPUs with hipSPARSELt
This blog post introduces semi-structured sparsity technology supported on AMD systems and explains how to use the corresponding library to leverage its benefit.
Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot
Learn how to use multi-node and multi-cluster autoscaling in the Ray framework on ROCm 7.0.0 with SkyPilot
Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0
Deploy verl on AMD GPUs for fast, scalable RLHF training with ROCm optimization, Docker scripts, and strong throughput and convergence results
Solution Blueprints: Accelerating AI Deployment with AMD Enterprise AI
This blog presents AIMs Solution Blueprints and demonstrates modular, Helm‑based deployment patterns.
Digital Twins on AMD: Building Robotic Simulations Using Edge AI PCs
Explore how Ryzen AI MAX enables robotic simulation on a single AI PC and take your first step into digital twins.