Posts by Michael Zhang

SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs

In the rapidly evolving landscape of artificial intelligence, the ability to deploy large language models (LLMs) and vision-language models (VLMs) efficiently is crucial for real-time applications. SGLang is an open-source framework designed to meet these demands by delivering fast backend runtime, a flexible frontend language, and extensive model support for a variety of LLMs and VLMs.

Read more ...


CTranslate2: Efficient Inference with Transformer Models on AMD GPUs

Transformer models have revolutionized natural language processing (NLP) by delivering high-performance results in tasks like machine translation, text summarization, text generation, and speech recognition. However, deploying these models in production can be challenging due to their high computational and memory requirements. CTranslate2 addresses these challenges by providing a custom runtime that implements various optimization techniques to accelerate Transformer models during inference.

Read more ...