Anshul Gupta#
Anshul Gupta is a Senior AI Product Marketing Manager with a passion for driving developer engagement and promoting cutting-edge AI technologies. Anshul’s expertise spans content creation, go-to-market strategies, and cross-functional collaboration, all aimed at empowering developers and accelerating the adoption of next-gen AI Developer Tools. He holds MBA degree from University of California Davis and MS in EEE from California State University.
Posts by Anshul Gupta
ROCm 7.2: Smarter, Faster, and More Scalable for Modern AI Workloads
we highlight the latest ROCm 7.2 enhancements for AMD Instinct GPUs, designed to boost AI and HPC performance
Continuing the Momentum: Refining ROCm For The Next Wave Of AI and HPC
ROCm 7.1 builds on 7.0’s AI and HPC advances with faster performance, stronger reliability, and streamlined tools for developers and system builders.
Efficient LLM Serving with MTP: DeepSeek V3 and SGLang on AMD Instinct GPUs
This blog will show you how to speed up LLM inference with Multi-Token Prediction in DeepSeek V3 & SGLang on AMD Instinct GPUs
Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs
Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.
Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware
Day 0 support across our AI hardware ecosystem from our flagship AMD InstinctTM MI355X and MI300X GPUs, AMD Radeon™ AI PRO R700 GPUs and AMD Ryzen™ AI Processors
Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework
This blog shows how CK-Tile’s XOR-based swizzle optimizes shared memory access in GEMM kernels on AMD GPUs by eliminating LDS bank conflicts
Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm
vLLM v0.9.x is here with major ROCm™ optimizations—boosting LLM performance, reducing latency, and expanding model support on AMD Instinct™ GPUs.
AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving
AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving
Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed
Unlock the full power of AMD GPUs—write portable, efficient kernels with Triton-Distributed, overlapping computation and communication with ease and flexibility
AITER: AI Tensor Engine For ROCm
We introduce AMD's AI Tensor Engine for ROCm (AITER), our centralized high performance AI operators repository, designed to significantly accelerate AI workloads on AMD GPUs
Deploying Google’s Gemma 3 Model with vLLM on AMD Instinct™ MI300X GPUs: A Step-by-Step Guide
AMD is excited to announce the integration of Google’s Gemma 3 models with AMD Instinct™ MI300X GPUs
Optimized ROCm Docker for Distributed AI Training
AMD updated Docker images incorporate torchtune finetuning, FP8 support, single node performance boost, bug fixes & updated benchmarking for stable, efficient distributed training
GEMM Kernel Optimization For AMD GPUs
Guide to how GEMMs can be tuned for optimal performance of AI models on AMD GPUs