Posts tagged Serving

Step-by-Step Guide to Use OpenLLM on AMD GPUs

OpenLLM is an open-source platform designed to facilitate the deployment and utilization of large language models (LLMs), supporting a wide range of models for diverse applications, whether in cloud environments or on-premises. In this tutorial, we will guide you through the process of starting an LLM server using OpenLLM, enabling interaction with the server from your local machine, with special emphasis on leveraging the capabilities of AMD GPUs.

Read more ...

Inferencing and serving with vLLM on AMD GPUs

top-level ‘html_meta’ key is deprecated, place under ‘myst’ key instead [myst.topmatter]

Read more ...