rtrl-plots.png

1560 x 859

298.28 KB

.png file

Loading compatible actions...

Loading comments...

0 views

1 reference

post
explores a simple idea: train a model to convert crystallography data from CIF to JSON and then judge how well that JSON could rewrite the original CIF. The policy model, based on a 3B language model with LoRA adapters, performs the forward conversion (CIF → JSON). A separate judge model, kept fixed, evaluates how likely it is to recover the exact CIF from that JSON, by computing a reverse-probability score token by token without actually generating the CIF. This score provides a reward signal for training the policy. The setup uses three parts: the policy (the convertor), the judge (scores round-trips), and a reference model for regularization. Training runs on Modal with three GPUs, using vLLM for judge serving and a careful memory plan. The goal is to create a reliable, reversible representation and to extend the approach to descriptions that generate CIF files.
3h