Large Language Diffusion Models Paper
This paper introduces LLaDA, a diffusion model trained from scratch under the pre-training and supervised finetuning (SFT) paradigm. LLaDA models distributions through a forward data masking process and a reverse process, parameterized by a vanilla Transformer to predict masked tokens.
https://arxiv.org/abs/2502.09992
Thanks for sharing Will. Your last paragraph hits. It's exciting because it means we can wield powerful generative AI in non-linear output domains. Paper for the curious:
comment8mo
1 reference
Loading compatible actions...
Loading comments...