Quit Emailing Yourself

# efficiency → transformer

2 links tagged with all of: efficiency + transformer

Links

deepseek-ai/DeepSeek-V3.2-Exp · Hugging Face

DeepSeek-V3.2-Exp has been released as an experimental model that incorporates a new sparse attention mechanism aimed at enhancing efficiency in handling long-context text sequences. This version maintains output quality while improving performance across various benchmarks compared to its predecessor, V3.1-Terminus. Detailed instructions for local setup and usage are also provided for the community.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ deepseek + sparse-attention transformer ✓ efficiency ✓ + benchmarks

[2510.02361] ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

The article presents ChunkLLM, a lightweight and pluggable framework designed to enhance the inference efficiency of large transformer models. It introduces two key components, QK Adapter and Chunk Adapter, which improve feature compression and chunk attention acquisition while maintaining high performance on both long and short text benchmarks. Experimental results indicate significant speedup in processing long texts compared to traditional transformer models.

Saved by hn_user_4 · Last saved October 28, 2025 · 3 min read

+ llm efficiency ✓ transformer ✓