1 link tagged with all of: machine-learning + speculative-decoding + autoregressive + diffusion
Click any tag below to further narrow down your results
Links
DFlash introduces a lightweight block diffusion model that enhances speculative decoding by enabling faster and more accurate parallel drafting. It combines the speed of diffusion models with the verification strength of autoregressive models, achieving significant performance improvements over existing methods like EAGLE-3. The approach demonstrates how to leverage the benefits of both model types without sacrificing quality.