Data Science Blogs - Page 2

Data Science Blogs - Page 2#

July 24, 2025

Benchmarking Reasoning Models: From Tokens to Answers

Learn how to benchmark reasoning tasks. Use Qwen3 and vLLM to test true reasoning performance, not just how fast words are generated.

./artificial-intelligence/benchmark-reasoning-models/README.html

July 18, 2025

Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs

Accelerate life science and medical workloads with ROCm-LS, AMDs GPU-optimized toolkit for faster multidimensional image processing and vision.

./software-tools-optimization/rocm-ls-intro/README.html

July 18, 2025

Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

Fully utilize the power of AMDs Instinct GPUs to process and interpret detailed multidimensional images with lightning speed.

./software-tools-optimization/hipcim-intro/README.html

July 03, 2025

Unlocking GPU-Accelerated Containers with the AMD Container Toolkit

Simplify GPU acceleration in containers with the AMD Container Toolkit—streamlined setup, runtime hooks, and full ROCm integration.

./software-tools-optimization/amd-container-toolkit/README.html

May 20, 2025

Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs

Accelerate data science with ROCm-DS: AMD’s GPU-optimized toolkit for faster data frames and graph analytics using hipDF and hipGRAPH

./software-tools-optimization/introducing-rocm-ds-revolutionizing-data-processing-with-amd-instinct-gpus/README.html

May 16, 2025

Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang

Boost DeepSeek-R1 with AITER: Step-by-step SGLang integration for high-performance MoE, GEMM, and attention ops on AMD GPUs

./artificial-intelligence/aiter-intergration-s/README.html

May 12, 2025

Accelerated JPEG decoding on AMD Instinct™ GPUs with rocJPEG

Learn how to decompress JPEG files at breakneck speeds for your AI, vision, and content delivery workloads using rocJPEG and AMD Instinct GPUs.

./artificial-intelligence/rocjpeg-decoding-performance-blog/README.html

May 01, 2025

Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools

Dive into kernel-level profiling of DeepseekV3 on SGLang—identify GPU bottlenecks and boost large language model performance using ROCm

./software-tools-optimization/kernel-analysis-deep/README.html

April 14, 2025

Installing ROCm from source with Spack

Install ROCm and PyTorch from source using Spack. Learn how to optimize builds, manage dependencies, and streamline your GPU software stacks.

./software-tools-optimization/spack-installation/README.html

March 02, 2025

Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X

The blog explains the reasons behind RCCL bandwidth limitations and xGMI performance constraints, and provides actionable steps to maximize link efficiency on AMD MI300X

./software-tools-optimization/mi300x-rccl-xgmi/README.html

August 29, 2024

Seismic stencil codes - part 2

Seismic Stencil Codes - Part 2: In the previous post, recall that the kernel with stencil computation in the z-direction suffered from low effective bandwidth. This low performance comes from generating substantial amounts of data to movement to global memory.

./high-performance-computing/seismic-stencils/part-2/README.html

August 29, 2024

Seismic stencil codes - part 3

Seismic Stencil Codes - Part 3: In the last two blog posts, we developed a HIP kernel capable of computing high order finite differences commonly needed in seismic wave propagation.

./high-performance-computing/seismic-stencils/part-3/README.html

Prev Page 2 of 4 Next