Posts by Yutao Xu

Accelerating LLM Inference on AMD GPUs with Low-Latency GEMMs

Large language model inference is becoming increasingly interactive. Users expect chatbots, coding assistants, agents, and real-time copilots to respond quickly, stream tokens smoothly, and stay responsive under concurrent load. In that setting, decode-time latency is not just a backend metric. It directly affects perceived quality.

Read more ...