Posts by Satya Jandhyala

Exploring Use Cases for Scalable AI: Implementing Ray with ROCm 7 Support for Efficient ML Workflows

27 February 2026

This blog builds on insights from our previous blog post, which introduced Ray 2.48.0.post0 running on ROCm 6.2 and demonstrated Reinforcement Learning from Human Feedback (RLHF) with verl 0.3.0.post0 and vLLM 0.6.4 on AMD GPUs. In this follow‑up, we introduce Ray 2.51.1 with ROCm 7.0.0, verl 0.6.0, and vLLM 0.11.0.dev, highlighting the new performance benefits and capabilities for large‑scale RLHF workloads.

Read more ...

Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

13 February 2026

This blog explores autoscaling of inference workloads in Ray Serve with a vLLM backend on AMD Instinct™ GPUs for large language models (LLMs). Furthermore, you will learn how to scale beyond a single cluster using SkyPilot, which enables multicloud scaling for Ray Serve. Combined with the AMD ROCm™ software platform, this creates a unified, cloud-agnostic platform that scales distributed LLM inference from single-GPU to multi-cluster deployments.

Read more ...

Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

12 February 2026

In our previous blog post, we introduced Volcano Engine Reinforcement Learning for LLMs (verl) 0.3.0.post0 with ROCm 6.2 and vLLM 0.6.4. In this blog post, we will provide you with an overview of verl 0.6.0 with ROCm 7.0.0 and vLLM 0.11.0.dev and its benefits for large-scale reinforcement learning from human feedback (RLHF). You will also learn about the modifications made to optimize verl performance on AMD Instinct™ MI300X GPUs. Next, you will walk through building the Docker image on your system, along with training scripts for single-node and multi-node setups. Lastly, we provide you with verl performance results, focusing on throughput and convergence accuracy achieved on AMD Instinct MI300X GPUs. Follow this guide to get started with verl on AMD Instinct GPUs and accelerate your RLHF training with ROCm-optimized performance.

Read more ...