Quit Emailing Yourself

# llm → efficiency

3 links tagged with all of: llm + efficiency

Click any tag below to further narrow down your results

Links

Deep Think with Confidence

Deep Think with Confidence (DeepConf) is a novel parallel thinking method that improves reasoning performance and efficiency of large language models (LLMs) by utilizing internal confidence signals to filter out low-quality reasoning traces. It can be integrated into existing frameworks without the need for additional training or tuning, achieving up to 99.9% accuracy on the AIME 2025 dataset while significantly reducing token generation. A real-time demo is available using the Qwen3-8B model with parallel thinking on the HMMT'25 dataset.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ deep-learning llm ✓ + reasoning efficiency ✓ + parallel-thinking

SyFI Lab: Home

LLMc is a novel compression engine that utilizes large language models (LLMs) to achieve superior data compression by leveraging rank-based encoding. It surpasses traditional methods such as ZIP and LZMA, demonstrating enhanced efficiency in processing and decompression. The project is open-source and aims to encourage contributions from the research community.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

llm ✓ + compression + information-theory + open-source efficiency ✓

[2510.02361] ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

The article presents ChunkLLM, a lightweight and pluggable framework designed to enhance the inference efficiency of large transformer models. It introduces two key components, QK Adapter and Chunk Adapter, which improve feature compression and chunk attention acquisition while maintaining high performance on both long and short text benchmarks. Experimental results indicate significant speedup in processing long texts compared to traditional transformer models.

Saved by hn_user_4 · Last saved October 28, 2025 · 3 min read

llm ✓ efficiency ✓ + transformer