Posts by Pratik Prabhanjan Brahma

GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025

Optimizing GPU kernels is a formidable task, traditionally requiring deep domain expertise and hours of manual tuning. At AMD, we are expanding our GEAK: Generating Efficient AI-centric GPU Kernels family to automate this entire workflow, from initial code generation to deep performance optimization.

Read more ...

GEAK HIP: Expanding GEAK for HIP Code Optimization

23 December 2025

This blog discusses the use of the Generating Efficient AI-centric Kernels (GEAK) agent for automated HIP code optimization, demonstrating how GEAK’s agentic pipelines can elevate customer and developer code and boost AI performance on AMD platforms.

Read more ...

Building a State-of-the-Art 32 Billion Reasoning Model with Only Synthetic Data on AMD GPUs

06 December 2025

Building a state-of-the-art reasoning model is often viewed as a resource-heavy marathon requiring massive compute, months of training, and proprietary datasets. In this blog, we demonstrate how we built a large reasoning model on AMD Instinct™ MI325 GPUs that surpasses the accuracy of the top 32 Billion sized open models on mathematics and science benchmarks, using only synthetic data and with standard Supervised Fine-Tuning (SFT) on top of older-generation models. In line with AMD’s commitment to open source, we are releasing the model weights, detailed training configurations, datasets, and code, enabling the AI community to collaborate, replicate, and innovate, thereby accelerating progress.

Read more ...

GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

01 August 2025

At AMD, we are pioneering ways to accelerate AI development using AI itself, by generating accurate and efficient GPU kernels. Specifically, we are starting with the automatic generation of kernels in Triton, an open-source Python-like language for writing parallel programming code for GPUs. Today, AMD is excited to announce (a) Generating Efficient AI-centric Kernels (GEAK) for AMD GPUs, and results on (b) two Triton kernel evaluation benchmarks, where we show how AI agents can perform inference-time scaling with frontier LLMs to generate accurate and efficient kernels for AMD Instinct™ GPUs like MI250X and MI300X.

Read more ...

Instella-VL-1B: First AMD Vision Language Model

07 March 2025

As part of AMD’s newly released Instella family we are thrilled to introduce Instella-VL-1B, the first AMD vision language model for image understanding trained on AMD Instinct™ MI300X GPUs. Our journey with Instella-VL builds upon our previous 1-billion-parameter language models, AMD OLMo SFT. We further extend the language model’s visual understanding abilities by connecting it with a vision encoder (which is initialized from CLIP ViT-L/14-336). During training, we jointly finetune vision encoder and language model with vision-language data in three stages: Alignment, Pretraining and Supervised-Finetuning (SFT).

Read more ...

Introducing Instella: New State-of-the-art Fully Open 3B Language Models

05 March 2025

AMD is excited to announce Instella, a family of fully open state-of-the-art 3-billion-parameter language models (LMs) trained from scratch on AMD Instinct™ MI300X GPUs. Instella models outperform existing fully open models of similar sizes and achieve competitive performance compared to state-of-the-art open-weight models such as Llama-3.2-3B, Gemma-2-2B, and Qwen-2.5-3B, including their instruction-tuned counterparts.

Read more ...