Posts by Chengjia Huang

AI Inference on AMD Ryzen™ AI Max Processor

Local large language model (LLM) inference has rapidly evolved, but a persistent limitation remains: model size is constrained by available GPU memory. Discrete GPUs typically offer 8–24 GB of dedicated VRAM, which can limit the size of models that can run without incurring significant quality loss from aggressive quantization. As frontier open-weight models grow past 70B and 100B parameters, this gap is forcing more developers toward multi-GPU rigs or paid cloud endpoints just to evaluate a single checkpoint.

Read more ...