Recent Posts - Page 15#

November 04, 2025

Stability at Scale: AMD’s Full‑Stack Platform for Large‑Model Training

Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

./software-tools-optimization/primus-SaFE/README.html

November 04, 2025

Retrieval Augmented Generation (RAG) with vLLM, LangChain and Chroma

Learn AI-powered knowledge retrieval that enriches prompts with proprietary data to deliver accurate and context-aware answers

./artificial-intelligence/rag-pipeline-vllm/README.html

October 29, 2025

High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

Learn to leverage AMD Quark for efficient MXFP4/MXFP6 quantization on AMD Instinct accelerators with high accuracy retention.

./software-tools-optimization/mxfp4-mxfp6-quantization/README.html

October 24, 2025

Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation

Nitro-E is an extremely lightweight diffusion transformer model for high-quality image generation with only 304M paramters.

./artificial-intelligence/nitro-e/README.html

October 23, 2025

Performance Profiling on AMD GPUs - Part 3: Advanced Usage

Part 3 of our GPU profiling series guides beginners through practical steps to identify and optimize kernel bottlenecks using ROCm tools

./software-tools-optimization/profiling-guide/advanced/README.html

October 23, 2025

STX-B0T: Real-time AI Robot Assistant Powered by RyzenAI and ROCm

STX-B0T explores the potential of RyzenAI PCs to power robotics applications on NPU+GPU. This blog demonstrates how our hardware and software interoperate to unlock real-time perception.

./artificial-intelligence/stx-b0t/README.html

October 21, 2025

Empowering Developers to Build a Robust PyTorch Ecosystem on AMD ROCm™ with Better Insights and Monitoring

Production ROCm support for N-1 to N+1 PyTorch releases is in progress. The AI Software Head-Up Dashboard shows status of PyTorch on ROCm.

./artificial-intelligence/pytorch-amd-gpus/README.html

October 20, 2025

ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System

Introduce ROCm Core SDK, and learn to install and build ROCm components easily using TheRock.

./software-tools-optimization/therock/README.html

October 16, 2025

Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs

Learn how AMD Instinct MI355 Series GPUs deliver competitive Kimi-K2 inference with faster TTFT, lower latency, and strong throughput.

./artificial-intelligence/kimi-k2/README.html

October 14, 2025

Gumiho: A New Paradigm for Speculative Decoding — Earlier Tokens in a Draft Sequence Matter More

Gumiho boosts LLM inference with early-token accuracy, blending serial + parallel decoding for speed, accuracy, and ROCm-optimized deployment.

./software-tools-optimization/gumiho/README.html

October 09, 2025

GEMM Tuning within hipBLASLt– Part 2

Learn how to use hipblaslt-bench for offline GEMM tuning in hipBLASLt—benchmark, save, and apply custom-tuned kernels at runtime.

./software-tools-optimization/hipblaslt-offline-tuning-part2/README.html

October 07, 2025

Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection

Learn how to setup, run and optimize SwinUNETR on AMD MI300X GPUs for fast medical imaging 3D segmentation of tumors using fast, large ROIs.

./artificial-intelligence/running-swinunetr-amd/README.html

Prev Page 15 of 36 Next