Data Science - Applications & Models#

Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang
Boost DeepSeek-R1 with AITER: Step-by-step SGLang integration for high-performance MoE, GEMM, and attention ops on AMD GPUs

Accelerated JPEG decoding on AMD Instinct™ GPUs with rocJPEG
Learn how to decompress JPEG files at breakneck speeds for your AI, vision, and content delivery workloads using rocJPEG and AMD Instinct GPUs.

Seismic stencil codes - part 2
Seismic Stencil Codes - Part 2: In the previous post, recall that the kernel with stencil computation in the z-direction suffered from low effective bandwidth. This low performance comes from generating substantial amounts of data to movement to global memory.

Seismic stencil codes - part 3
Seismic Stencil Codes - Part 3: In the last two blog posts, we developed a HIP kernel capable of computing high order finite differences commonly needed in seismic wave propagation.

Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs
Using Statistical Methods to Reliably Compare Algorithm Performance in Large Generative AI Models with JAX Profiler on AMD GPUs

Accelerating models on ROCm using PyTorch TunableOp
Accelerating models on ROCm using PyTorch TunableOp

Speech-to-Text on an AMD GPU with Whisper
Speech to Text on AMD with Whisper

Sparse matrix vector multiplication - part 1
Sparse matrix vector multiplication - Part 1

Finite difference method - Laplacian part 4
Finite difference method - Laplacian Part 4

Finite difference method - Laplacian part 3
Finite difference method - Laplacian Part 3