Posts by Kang Liu
SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning
- 02 January 2026
In this blog we will discuss SparK, a training-free, plug-and-play method for KV cache compression in large language models (LLMs). By addressing the overlooked redundancy in feature channels and employing a “prune-and-recover” strategy, SparK reduces KV cache storage by over 30% compared to traditional methods while maintaining model accuracy. It offers a robust solution for long-context inference, establishing a new perspective on unstructured sparsity.