Click any tag below to further narrow down your results
Links
DeepSeek introduced a paper detailing its innovative training method called Manifold-Constrained Hyper-Connections. This approach aims to enhance scalability and reduce energy use in AI development, addressing challenges tied to limited access to Nvidia chips in China.
DeepSeek-V3.2-Exp has been released as an experimental model that incorporates a new sparse attention mechanism aimed at enhancing efficiency in handling long-context text sequences. This version maintains output quality while improving performance across various benchmarks compared to its predecessor, V3.1-Terminus. Detailed instructions for local setup and usage are also provided for the community.