Posts tagged Kubernetes

Deploying Serverless AI Inference on AMD GPU Clusters

Deploying Large Language Models (LLMs) in enterprise environments presents a multitude of challenges that organizations must navigate to harness their full potential. As enterprises expand their AI and HPC workloads, scaling the underlying compute and GPU infrastructure presents numerous challenges, including deployment complexities, resource optimization, and effective management of the compute resource fleet. In this blog, we will walk you through how to spin-up production-grade Serverless AI inference service on Kubernetes clusters by leveraging open source Knative/KServe technologies.

Read more ...


AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2

Welcome to Part 2 of our series on utilizing Kubernetes with the AMD Instinct platform! If you’re just joining us, we recommend checking out Part 1 where we covered setting up your Kubernetes cluster and enabling AMD GPU support.

Read more ...


AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1

As organizations scale their AI inference workloads, they face the challenge of efficiently deploying and managing large language models across GPU infrastructure. This three-part blog series provides a production-ready foundation for orchestrating AI inference workloads on the AMD Instinct platform with Kubernetes.

Read more ...


Announcing the AMD GPU Operator and Metrics Exporter

As AI workloads continue to grow in complexity and scale, we’ve consistently heard one thing from our customers: “Managing GPU infrastructure shouldn’t be the hard part”. For many, this is where Kubernetes comes into play. Kubernetes allows customers to easily manage and deploy their AI workloads at scale by providing a robust platform for automating deployment, scaling, and operations of application containers across clusters of hosts. It ensures that your applications run consistently and reliably, regardless of the underlying infrastructure. A pod is the smallest and simplest Kubernetes object. It represents a single instance of a running process in your cluster and can contain one or more containers. Pods are used to host your application workloads and are managed by Kubernetes to ensure they run as expected. Having pods be able to leverage GPUs on your cluster, however, is not something that is trivial.

Read more ...