WindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference Optimization

Abstract

TBD

Type
Publication
In Transactions on Architecture and Code Optimization
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.