In-context Ranking (ICR) utilizes the contextual understanding of large language models (LLMs) for information retrieval by incorporating the task description, candidate documents, and query into the model's input. This paper introduces BlockRank, a new method that enhances the efficiency of attention operations in LLMs by enforcing inter-document block sparsity and optimizing query-document relevance, achieving significant performance improvements and scalability for long context retrieval tasks. Experiments demonstrate that BlockRank matches or surpasses state-of-the-art methods while being considerably more efficient at inference.
+ in-context-ranking
information-retrieval ✓
generative-models ✓
attention-mechanism ✓
scalable-solutions ✓