Posts by Alex He
A Step-by-Step Guide On How To Deploy Llama Stack on AMD Instinct™ GPU
- 22 April 2025
As a leader in high-performance computing, AMD empowers AI innovation by providing open-source tools and hardware acceleration for scalable model deployment. In this blog we will show you how this foundation can be leveraged to deploy Meta’s LLMs efficiently on AMD Instinct™ GPUs. Meta’s Llama series has democratized access to large language models, empowering developers worldwide. The Llama Stack—Meta’s all-in-one deployment framework—extends this vision by enabling seamless transitions from research to production through built-in tools for optimization, API integration, and scalability. This unified platform is ideal for teams requiring robust support to deploy Meta’s models at scale across diverse applications.
AMD Advances Enterprise AI Through OPEA Integration
- 12 March 2025
AMD is excited to support Open Platform for Enterprise AI (OPEA) to simplify and accelerate enterprise AI adoption. With the enablement of OPEA GenAI framework on AMD ROCm™ software stack, businesses and developers can now create scalable, efficient GenAI applications on AMD data center GPUs. Enterprises today face significant challenges when deploying AI at scale, including the complexity of integrating GenAI models, managing GPU resources, ensuring security, and maintaining workflow flexibility. AMD and OPEA aim to address these challenges and streamline AI adoption. This blog will explore the significance of this collaboration, AMD’s contribution to the OPEA project, and demonstrate how to deploy a code translation OPEA GenAI use case on the AMD Instinct™ MI300X GPU.
Navigating vLLM Inference with ROCm and Kubernetes
- 13 February 2025
Kubernetes (often abbreviated as K8s) is an open-source platform designed for automating the deployment, scaling, and management of containerized applications. Developed by Google and now maintained by the Cloud Native Computing Foundation, Kubernetes enables developers to build, run, and manage applications across any infrastructure.