Large Language Diffusion Models Paper

Name: Large Language Diffusion Models Paper
Creator: mmoderwell
License: https://en.wikipedia.org/wiki/All_rights_reserved

This paper introduces LLaDA, a diffusion model trained from scratch under the pre-training and supervised finetuning (SFT) paradigm. LLaDA models distributions through a forward data masking process and a reverse process, parameterized by a vanilla Transformer to predict masked tokens. https://arxiv.org/abs/2502.09992

2.00 MB