Ashish Sirasao

Ashish Sirasao#

ASHISH SIRASAO is a Corporate Vice President at AMD. His research interests include hardware–software co-design, programming paradigms for domain-specific accelerators, and deep-learning algorithms. He received his M.Tech degree in electrical engineering from IIT Mumbai.

Posts by Ashish Sirasao

July 13, 2026

Serving NVFP4 Models on AMD Instinct™ MI355 Accelerators

Learn how to serve NVFP4 models on AMD Instinct™ MI355 using an emulation pipeline in vLLM — no format conversion needed.

https://rocm.blogs.amd.com/software-tools-optimization/nvfp4-mi355/README.html

July 13, 2026

QuickReduce INT3 Quantization and Benchmarking on MI355

Learn how QuickReduce uses INT3 quantization to accelerate all-reduce communication and evaluate its performance and accuracy on AMD Instinct MI355 GPUs.

https://rocm.blogs.amd.com/artificial-intelligence/quick-reduce-3/README.html

July 06, 2026

Accelerating Diffusers and xDiT Image Generation with MXFP4 using AMD Quark on AMD Instinct™ MI350 GPUs

Accelerate Diffusers and xDiT FLUX.1-dev image generation on AMD Instinct MI350 GPUs using AMD Quark MXFP4 quantization.

https://rocm.blogs.amd.com/artificial-intelligence/quark-xdit/README.html

July 03, 2026

Accelerating Large-Scale LLM Inference on AMD Instinct MI350X/MI355X with Eagle3 and AMD Quark

Learn how the AMD Quark team enables Eagle3 speculative decoding for Kimi-K2.5 and MiniMax-M2.5 on AMD Instinct MI355X GPUs with ROCm, vLLM, and InferenceX.

https://rocm.blogs.amd.com/artificial-intelligence/eagle3-speculative-decoding/README.html

June 26, 2026

MXFP6 and MXFP4 Mixed Precision for Accelerating Dense LLMs on AMD Instinct MI355X

W_MXFP4_A_MXFP6 quantization on AMD Instinct MI355X improves LLM throughput and latency while recovering accuracy versus MXFP4.

https://rocm.blogs.amd.com/artificial-intelligence/w4a6-quant-mm/README.html

June 11, 2026

Low Kruskal-Rank Adaptation

Learn how Kruskal rank can enhance LoRA by replacing the conventional matrix-rank formulation for more efficient training.

https://rocm.blogs.amd.com/artificial-intelligence/lokra/README.html

June 11, 2026

Productionizing TurboQuant on AMD GPUs for KV-Cache-Bound LLM Inference

Productionized TurboQuant 4-bit KV-cache quantization on AMD GPUs via vLLM, with custom kernels and accuracy analysis on agentic workloads.

https://rocm.blogs.amd.com/artificial-intelligence/turboquant-vllm-agentic/README.html

May 20, 2026

QuickReduce FP4 Quantization and Benchmarking on MI355

Learn how QuickReduce uses FP4 quantization to accelerate all-reduce communication and evaluate its performance on AMD Instinct MI355 GPUs.

https://rocm.blogs.amd.com/artificial-intelligence/quick-reduce-2/README.html

March 25, 2026

Programming Tensor Descriptors in Composable Kernel (CK)

Learn how to use TensorDescriptor in Composable Kernel (CK) to manage multi-dimensional data layouts and write efficient GPU kernels on AMD GPUs.

https://rocm.blogs.amd.com/artificial-intelligence/amd_gpu_programming_guide/README.html

March 24, 2026

Engineering Qwen-VL for Production: Vision Module Architecture and Optimization Practices

Explore how to optimize Qwen-VL for production on AMD Instinct MI308X GPUs with ROCm, from vision module architecture to kernel fusion and deployment.

https://rocm.blogs.amd.com/artificial-intelligence/qwen-vl/README.html

March 19, 2026

hipBLASLt Online GEMM Tuning

Learn how to improve model performance with hipBLASLt online tuning merged into LLM framework

https://rocm.blogs.amd.com/artificial-intelligence/hipblaslt_online_tuning/README.html

February 17, 2026

Advanced MXFP4 Quantization: Combining Fine-Tuned Rotations with SmoothQuant for Near-Lossless Compression

Showcase advanced algorithms available in AMD Quark for efficient MXFP4 quantization on AMD Instinct accelerators with high accuracy retention.

https://rocm.blogs.amd.com/software-tools-optimization/mxfp4-online-rotation/README.html

November 05, 2025

Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script

Learn how to improve model performance with hipBLASLt offline tuning in our easy-to-use Day 0 tool for developers to optimize GEMM efficiency

https://rocm.blogs.amd.com/artificial-intelligence/hipblaslt_offline_tuning/README.html

October 29, 2025

High-Accuracy MXFP4, MXFP6, and Mixed-Precision Models on AMD GPUs

Learn to leverage AMD Quark for efficient MXFP4/MXFP6 quantization on AMD Instinct accelerators with high accuracy retention.

https://rocm.blogs.amd.com/software-tools-optimization/mxfp4-mxfp6-quantization/README.html

August 26, 2025

QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang

Quick Reduce speeds up LLM inference on AMD Instinct™ MI300X GPUs with inline-compressed all-reduce, cutting comms overhead by up to 3×

https://rocm.blogs.amd.com/artificial-intelligence/quick-reduce/README.html