Posts tagged Kubernetes
AMD Inference Microservice (AIM): Production Ready Inference on AMD Instinct™ GPUs
- 17 November 2025
As generative AI models continue to expand in scale, context length, and operational complexity, enterprises face a harder challenge: how to deploy and operate inference reliably, efficiently, and at production scale. Running LLMs or multimodal models on real workloads requires more than high-performance GPUs. It requires reproducible deployments, predictable performance, seamless orchestration, and an operational framework that teams can trust.
AMD Enterprise AI Suite: Open Infrastructure for Production AI
- 17 November 2025
In this blog, you’ll learn how to operationalize enterprise AI on AMD Instinct™ GPUs using an open, Kubernetes-native software stack. AMD Enterprise AI Suite provides a unified platform that integrates GPU infrastructure, workload orchestration, model inference, and lifecycle governance without dependence on proprietary systems. We begin by outlining the end-to-end architecture and then walk through how each component fits into production workflows: AMD Inference Microservice (AIM) for optimized and scalable model serving, AMD Solution Blueprints for assembling these capabilities into validated, end-to-end AI workflows, AMD Resource Manager for infrastructure administration and multi-team governance, and AMD AI Workbench for reproducible development and fine-tuning environments. Together, these building blocks show how to build, scale, and manage AI workloads across the enterprise using an open, modular, production-ready stack.
Democratizing AI Compute with AMD Using SkyPilot
- 13 November 2025
Democratizing AI compute means making advanced infrastructure and tools accessible to everyone—empowering startups, researchers, and developers to train, deploy, and scale AI models without being constrained by proprietary systems or vendor lock-in. The AMD open AI ecosystem, built on AMD ROCm™ Software, pre-built optimized Docker images, and AMD Developer Cloud, provides the foundation for this vision.
GPU Partitioning Made Easy: Pack More AI Workloads Using AMD GPU Operator
- 01 October 2025
Modern AI workloads often don’t utilize the full capacity of advanced GPU hardware, especially when running smaller models or during development phases. The AMD GPU partitioning feature addresses this challenge by allowing you to divide physical GPUs into multiple virtual GPUs, dramatically improving resource utilization and cost efficiency.
Unlocking GPU-Accelerated Containers with the AMD Container Toolkit
- 03 July 2025
In the rapidly evolving fields of high-performance computing (HPC), artificial intelligence (AI), and machine learning (ML), containerization has become a cornerstone of modern application deployment. Containers provide a lightweight, portable, and scalable way to package applications and their dependencies. The integration of GPUs into these environments has become imperative. However, leveraging GPU acceleration within containers has historically been a complex and error-prone process, particularly when ensuring seamless access to GPU hardware resources.
ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem
- 06 June 2025
This blog is part of our ROCm Revisited series[1]. The purpose of this series is to share the story of ROCm and our journey through the changes and successes we’ve achieved over the past few years. We’ll explore the key milestones in our development, the innovative technologies that have propelled us forward, and the challenges we’ve overcome to establish our leadership in the world of GPU computing.
ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software
- 11 April 2025
In the rapidly evolving landscape of high-performance computing and artificial intelligence, innovation is the currency of progress. AMD’s ROCm 6.4 isn’t just another software update—it’s a leap forward that redefines the boundaries of what is possible for AI, developers, researchers, and enterprise innovators.
What’s New in the AMD GPU Operator v1.2.0 Release
- 28 March 2025
The GPU Operator v1.2.0 release introduces significant new features, including GPU health monitoring, automated component and driver upgrades, and a new device test runner component for enhanced validation and troubleshooting. These improvements aim to increase reliability, streamline upgrades, and provide enhanced visibility into GPU health.
AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3
- 13 March 2025
Welcome back to the final part of our series! So far, we’ve successfully setup up a Kubernetes cluster and installed the AMD GPU Operator to seamlessly integrate AMD hardware with Kubernetes in Part 1. We’ve deployed vLLM on AMD Instinct MI300X GPUs, exposed it using MetalLB, and scaled it efficiently in Part 2.
Deploying Serverless AI Inference on AMD GPU Clusters
- 25 February 2025
Deploying Large Language Models (LLMs) in enterprise environments presents a multitude of challenges that organizations must navigate to harness their full potential. As enterprises expand their AI and HPC workloads, scaling the underlying compute and GPU infrastructure presents numerous challenges, including deployment complexities, resource optimization, and effective management of the compute resource fleet. In this blog, we will walk you through how to spin-up production-grade Serverless AI inference service on Kubernetes clusters by leveraging open source Knative/KServe technologies.
AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2
- 14 February 2025
Welcome to Part 2 of our series on utilizing Kubernetes with the AMD Instinct platform! If you’re just joining us, we recommend checking out Part 1 where we covered setting up your Kubernetes cluster and enabling AMD GPU support.
AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1
- 07 February 2025
As organizations scale their AI inference workloads, they face the challenge of efficiently deploying and managing large language models across GPU infrastructure. This three-part blog series provides a production-ready foundation for orchestrating AI inference workloads on the AMD Instinct platform with Kubernetes.
Announcing the AMD GPU Operator and Metrics Exporter
- 29 January 2025
As AI workloads continue to grow in complexity and scale, we’ve consistently heard one thing from our customers: “Managing GPU infrastructure shouldn’t be the hard part”. For many, this is where Kubernetes comes into play. Kubernetes allows customers to easily manage and deploy their AI workloads at scale by providing a robust platform for automating deployment, scaling, and operations of application containers across clusters of hosts. It ensures that your applications run consistently and reliably, regardless of the underlying infrastructure. A pod is the smallest and simplest Kubernetes object. It represents a single instance of a running process in your cluster and can contain one or more containers. Pods are used to host your application workloads and are managed by Kubernetes to ensure they run as expected. Having pods be able to leverage GPUs on your cluster, however, is not something that is trivial.