DeepSeek-V3.2-Exp has been released as an experimental model that incorporates a new sparse attention mechanism aimed at enhancing efficiency in handling long-context text sequences. This version maintains output quality while improving performance across various benchmarks compared to its predecessor, V3.1-Terminus. Detailed instructions for local setup and usage are also provided for the community.
The article presents the Decoupled Diffusion Transformer (DDT) architecture, demonstrating improved performance with a larger encoder in a diffusion model framework. It achieves state-of-the-art FID scores on ImageNet benchmarks and allows for accelerated inference by reusing encoders across steps. The implementation provides detailed configurations for training and inference, along with online demos.