Posts by Satya Jandhyala

Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

This blog explores autoscaling of inference workloads in Ray Serve with a vLLM backend on AMD Instinct™ GPUs for large language models (LLMs). Furthermore, you will learn how to scale beyond a single cluster using SkyPilot, which enables multicloud scaling for Ray Serve. Combined with the AMD ROCm™ software platform, this creates a unified, cloud-agnostic platform that scales distributed LLM inference from single-GPU to multi-cluster deployments.

Read more ...


Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

In our previous blog post, we introduced Volcano Engine Reinforcement Learning for LLMs (verl) 0.3.0.post0 with ROCm 6.2 and vLLM 0.6.4. In this blog post, we will provide you with an overview of verl 0.6.0 with ROCm 7.0.0 and vLLM 0.11.0.dev and its benefits for large-scale reinforcement learning from human feedback (RLHF). You will also learn about the modifications made to optimize verl performance on AMD Instinct™ MI300X GPUs. Next, you will walk through building the Docker image on your system, along with training scripts for single-node and multi-node setups. Lastly, we provide you with verl performance results, focusing on throughput and convergence accuracy achieved on AMD Instinct MI300X GPUs. Follow this guide to get started with verl on AMD Instinct GPUs and accelerate your RLHF training with ROCm-optimized performance.

Read more ...