Anshul Gupta

Anshul Gupta#

Anshul Gupta is a Senior AI Product Marketing Manager with a passion for driving developer engagement and promoting cutting-edge AI technologies. Anshul’s expertise spans content creation, go-to-market strategies, and cross-functional collaboration, all aimed at empowering developers and accelerating the adoption of next-gen AI Developer Tools. He holds MBA degree from University of California Davis and MS in EEE from California State University.

Posts by Anshul Gupta

May 20, 2026

ROCm 7.13: Expanding Hardware, Tools, and Reach

Explore what's new in the ROCm 7.13 release, featuring expanded hardware support, GPU virtualization, enhanced developer tooling, and TheRock's modular packaging.

https://rocm.blogs.amd.com/ecosystems-and-partners/rocm-7.13-blog/README.html

January 22, 2026

ROCm 7.2: Smarter, Faster, and More Scalable for Modern AI Workloads

we highlight the latest ROCm 7.2 enhancements for AMD Instinct GPUs, designed to boost AI and HPC performance

https://rocm.blogs.amd.com/software-tools-optimization/rocm7.2/README.html

November 05, 2025

Continuing the Momentum: Refining ROCm For The Next Wave Of AI and HPC

ROCm 7.1 builds on 7.0’s AI and HPC advances with faster performance, stronger reliability, and streamlined tools for developers and system builders.

https://rocm.blogs.amd.com/ecosystems-and-partners/rocm-7.1/README.html

September 11, 2025

Efficient LLM Serving with MTP: DeepSeek V3 and SGLang on AMD Instinct GPUs

This blog will show you how to speed up LLM inference with Multi-Token Prediction in DeepSeek V3 & SGLang on AMD Instinct GPUs

https://rocm.blogs.amd.com/software-tools-optimization/mtp/README.html

August 22, 2025

Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

https://rocm.blogs.amd.com/software-tools-optimization/primus/README.html

August 05, 2025

Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

Day 0 support across our AI hardware ecosystem from our flagship AMD InstinctTM MI355X and MI300X GPUs, AMD Radeon™ AI PRO R700 GPUs and AMD Ryzen™ AI Processors

https://rocm.blogs.amd.com/ecosystems-and-partners/openai-day-0/README.html

July 25, 2025

Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

This blog shows how CK-Tile’s XOR-based swizzle optimizes shared memory access in GEMM kernels on AMD GPUs by eliminating LDS bank conflicts

https://rocm.blogs.amd.com/software-tools-optimization/lds-bank-conflict/README.html

June 28, 2025

Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm

vLLM v0.9.x is here with major ROCm™ optimizations—boosting LLM performance, reducing latency, and expanding model support on AMD Instinct™ GPUs.

https://rocm.blogs.amd.com/software-tools-optimization/vllm-0.9.x-rocm/README.html

May 20, 2025

AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving

https://rocm.blogs.amd.com/artificial-intelligence/llm-d-distributed/README.html

May 06, 2025

Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

Unlock the full power of AMD GPUs—write portable, efficient kernels with Triton-Distributed, overlapping computation and communication with ease and flexibility

https://rocm.blogs.amd.com/software-tools-optimization/triton-distributed-c/README.html

March 21, 2025

AITER: AI Tensor Engine For ROCm

We introduce AMD's AI Tensor Engine for ROCm (AITER), our centralized high performance AI operators repository, designed to significantly accelerate AI workloads on AMD GPUs

https://rocm.blogs.amd.com/software-tools-optimization/aiter-ai-tensor-engine/README.html

March 14, 2025

Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide

AMD is excited to announce the integration of Google’s Gemma 3 models with AMD Instinct™ MI300X GPUs

https://rocm.blogs.amd.com/artificial-intelligence/deployingGemma-vllm/README.html

March 13, 2025

Optimized ROCm Docker for Distributed AI Training

AMD updated Docker images incorporate torchtune finetuning, FP8 support, single node performance boost, bug fixes & updated benchmarking for stable, efficient distributed training

https://rocm.blogs.amd.com/software-tools-optimization/amd-optimized-rocm-docker-for-distributed-training/README.html

February 06, 2025

GEMM Kernel Optimization For AMD GPUs

Guide to how GEMMs can be tuned for optimal performance of AI models on AMD GPUs

https://rocm.blogs.amd.com/artificial-intelligence/gemm_blog/README.html