Posts by Chao Li

Low Kruskal-Rank Adaptation

In this blog, you will explore how to enhance Low-Rank Adaptation (LoRA) which uses matrix rank, and replace it with Kruskal rank for efficient training. LoRA is one of the most widely used parameter-efficient fine-tuning (PEFT) methods for adapting pre-trained large language models (LLMs) to downstream tasks. Although LoRA significantly reduces the number of trainable parameters and lowers fine-tuning costs, its performance is often limited by the inherent low-rank assumption. We revisit the notion of rank for LoRA update matrices and show that the standard matrix rank fails to capture duplicated directions and redundancy in the update subspace. Motivated by this analysis, we argue that the Kruskal rank offers a more informative criterion for characterizing update diversity. We therefore propose Low Kruskal Rank Adaptation (LoKRA), a new PEFT algorithm with provable theoretical guarantees that mitigates the limitations of LoRA. We further introduce LoKRA+, an enhanced variant that provides a tighter theoretical lower bound on the Kruskal rank and yields stronger empirical performance. Experiments on multiple LLMs show that our approach consistently outperforms LoRA and other baselines, establishing state-of-the-art performance across a range of benchmarks. The paper is accepted by ICML 2026 (paper link), and the code is publicly available on GitHub.

Read more ...


hipBLASLt Online GEMM Tuning

This blog post introduces the integration of hipBLASLt Online GEMM Tuning into LLM frameworks, illustrated through an example implementation of RTP-LLM. Developed by the AMD Quark Team, hipBLASLt Online Tuning provides a user-friendly approach to improving GEMM performance by enabling runtime tuning without requiring additional offline tuning steps.

Read more ...


Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script

This blog post focuses on optimizing the performance of a real model using the QuickTune script, illustrated with an example of offline GEMM tuning for the Qwen model on an AMD MI308 GPU. Developed by the AMD Quark Team, the QuickTune script delivers significant GEMM performance improvements with minimal time overhead. QuickTune is an advanced tool for hipBLASLt offline GEMM tuning. It allows users to complete offline tuning with one click, instead of using hipblaslt-bench to tune the model manually.

Read more ...