Posts by Mukhil Azhagan Mallaiyan Sathiaseelan

Enabling FlashInfer on ROCm for Accelerated LLM Serving

FlashInfer is an innovative framework designed to accelerate inference of large language models (LLMs). Given the explosive growth and adoption of models like DeepSeek R1, Llama 3, and Qwen 3, efficient inference is critical to meet the demands of real-world deployment. However, challenges such as GPU memory bottlenecks, throughput limitations, and latency remain significant hurdles for deploying these models at scale.

Read more ...


DGL in the Real World: Running GNNs on Real Use Cases

In our previous blog post, we introduced the Deep Graph Library (DGL) and highlighted how its support on the AMD ROCm platform unlocks scalable, performant graph neural networks (GNNs) on AMD GPUs. That post focused on the why — the growing relevance of graph workloads and what it means to bring that capability to AMD’s accelerated computing ecosystem.

Read more ...


Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware

This blog introduces the Deep Graph Library (DGL) and explores its significance on AMD hardware for enabling scalable, performant graph neural networks.

Read more ...