Posts by Chang Liu
Efficient LLM Serving with MTP: DeepSeek V3 and SGLang on AMD Instinct GPUs
- 11 September 2025
Speculative decoding has become a key technique for accelerating large language model inference. Its effectiveness, however, relies heavily on creating the right balance between speed and accuracy in the draft model. Recent advances in Multi-Token Prediction (MTP) integrate seamlessly with speculative decoding, enabling the draft model to be more lightweight and consistent with the base model—ultimately making inference both faster and more effective.
Speculative Decoding - Deep Dive
- 24 March 2025
Nowadays, LLM serving has become an increasingly popular service in the technology industry, with thousands of requests being sent to LLM servers, and responses generated and sent back to clients all over the world. The performance of online serving, as one of the key metrics to evaluate its user experience and service quality, has grabbed attention from both of the industry and academia.