Large Language Diffusion Models Paper

2.00 MB

.pdf file

arxiv.org/abs/2502.09992

Loading compatible actions...

Loading comments...

This paper introduces LLaDA, a diffusion model trained from scratch under the pre-training and supervised finetuning (SFT) paradigm. LLaDA models distributions through a forward data masking process and a reverse process, parameterized by a vanilla Transformer to predict masked tokens.

https://arxiv.org/abs/2502.09992

14 views

ARR license

1 reference

comment
9mo