Large Language Diffusion Models Paper
This paper introduces LLaDA, a diffusion model trained from scratch under the pre-training and supervised finetuning (SFT) paradigm. LLaDA models distributions through a forward data masking process and a reverse process, parameterized by a vanilla Transformer to predict masked tokens.
https://arxiv.org/abs/2502.09992
Loading compatible actions...
Loading comments...