Recent Posts - Page 6#

Retrieval Augmented Generation (RAG) with vLLM, LangChain and Chroma

Learn AI-powered knowledge retrieval that enriches prompts with proprietary data to deliver accurate and context-aware answers

November 04, 2025 by Rasmus Larsson, Emelie Wahlstrom, Saroosh Shabbir

Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training

Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

November 04, 2025 by Chaojun Hou, Lei Wei, Liz Li, Yao Fu, Andy Luo, Zhenyu Gu

High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

Learn to leverage AMD Quark for efficient MXFP4/MXFP6 quantization on AMD Instinct accelerators with high accuracy retention.

October 29, 2025 by Lin Zhao, Felix Marty, Spandan Tiwari, Wei Luo, Bowen Bao, Xinjun Niu, Zhaofeng Zhang, Haoyang Li, Ke Wang, Ashish Sirasao

Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation

Nitro-E is an extremely lightweight diffusion transformer model for high-quality image generation with only 304M paramters.

October 24, 2025 by Tong Shen, Jingai Yu, Dong Zhou, Dong Li, Emad Barsoum

STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm

STX-B0T explores the potential of RyzenAI PCs to power robotics applications on NPU+GPU. This blog demonstrates how our hardware and software interoperate to unlock real-time perception.

October 23, 2025 by Justin Chu, Yosi Hatekar, Vivian Cheng, Hyunji Kim, Alex Bogdan

Performance Profiling on AMD GPUs - Part 3: Advanced Usage

Part 3 of our GPU profiling series guides beginners through practical steps to identify and optimize kernel bottlenecks using ROCm tools

October 23, 2025 by Gina Sitaraman, Thomas Gibson, Luka Stanisic, Giacomo Capodaglio, Alessandro Fanfarillo, Asitav Mishra

Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring

Production ROCm support for N-1 to N+1 PyTorch releases is in progress. The AI Software Head-Up Dashboard shows status of PyTorch on ROCm.

October 21, 2025 by Hongxia Yang, Peng Sun, Nick Romero, Jeff Daily, Jithun Nair, Pruthvi Madugundu, Jagadish Krishnamoorthy, Srinivasan Subramanian, Eli Uriegas

ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System

Introduce ROCm Core SDK, and learn to install and build ROCm components easily using TheRock.

October 20, 2025 by Dominic Widdows, Janet Tseng, Scott Todd, Chris Sosa, Saad Rahim

Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs

Learn how AMD Instinct MI355 Series GPUs deliver competitive Kimi-K2 inference with faster TTFT, lower latency, and strong throughput.

October 16, 2025 by Wei Cai, Fan Wu, George Wang

Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More

Gumiho boosts LLM inference with early-token accuracy, blending serial + parallel decoding for speed, accuracy, and ROCm-optimized deployment.

October 14, 2025 by Jinze Li, Yixing Xu, Xuanwu Yin, Dong Li, Emad Barsoum

GEMM Tuning within hipBLASLt– Part 2

Learn how to use hipblaslt-bench for offline GEMM tuning in hipBLASLt—benchmark, save, and apply custom-tuned kernels at runtime.

October 09, 2025 by Chia Hung, YangWen Huang, Carson Liao

Announcing MONAI 1.0.0 for AMD ROCm: Breakthrough AI Acceleration for Medical Imaging Models on AMD Instinct™ GPUs

Learn how to use Medical Open Network for Artificial Intelligence (MONAI) 1.0 on ROCm, with examples and demonstrations.

October 07, 2025 by Soumitra Chatterjee, Anik Chaudhuri, Chandan Sharma, Vikas C Sajjan, Vish Vadlamani, Dominic Widdows, Phani Vaddadi

Prev Page 6 of 27 Next