Posts by Pratik Mishra

Elevate Your LLM Inference: Autoscaling with Ray, ROCm 7.0.0, and SkyPilot

This blog explores autoscaling of inference workloads in Ray Serve with a vLLM backend on AMD Instinct™ GPUs for large language models (LLMs). Furthermore, you will learn how to scale beyond a single cluster using SkyPilot, which enables multicloud scaling for Ray Serve. Combined with the AMD ROCm™ software platform, this creates a unified, cloud-agnostic platform that scales distributed LLM inference from single-GPU to multi-cluster deployments.

Read more ...


Democratizing AI Compute with AMD Using SkyPilot

Democratizing AI compute means making advanced infrastructure and tools accessible to everyone—empowering startups, researchers, and developers to train, deploy, and scale AI models without being constrained by proprietary systems or vendor lock-in. The AMD open AI ecosystem, built on AMD ROCm™ Software, pre-built optimized Docker images, and AMD Developer Cloud, provides the foundation for this vision.

Read more ...