본문 바로가기

llm4

[논문] LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models AbstractLarge language models (LLMs) have been applied in various applications due to their astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) prompting and in-context learning (ICL), the prompts fed to LLMs are becoming increasingly lengthy, even exceeding tens of thousands of tokens. To accelerate model inference and reduce cost, this paper presents LLML.. 2025. 1. 9.
[논문] RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective AugmentationRetrieving documents and prepending them in-context at inference time improves performance of language model (LMs) on a wide range of tasks. However, these documents, often spanning hundreds of words, make inference substantially more expensive. We proposearxiv.orgAbstractRetrieving documents and prepending them.. 2025. 1. 9.
[논문] Compressing Context to Enhance Inference Efficiency of Large Language Models Compressing Context to Enhance Inference Efficiency of Large Language ModelsYucheng Li, Bo Dong, Frank Guerin, Chenghua Lin. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.aclanthology.orgAbstractLarge language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended con.. 2025. 1. 9.
[논문] Learning to Filter Context for Retrieval-Augmented Generation Learning to Filter Context for Retrieval-Augmented GenerationOn-the-fly retrieval of relevant knowledge has proven an essential element of reliable systems for tasks such as open-domain question answering and fact verification. However, because retrieval systems are not perfect, generation models are required to genarxiv.orgAbstractOn-the-fly retrieval of relevant knowledge has proven an essenti.. 2025. 1. 9.
반응형