describes our first full training run, which tried to invert an earlier task. Instead of turning CIF output into JSON, we aimed for Qwen 2.5 to take a description of a crystal structure and return a valid CIF. The logged metrics looked promising, with progress up to 756 tokens planned, but we should have watched the raw policy outputs more closely. Between steps 70 and 100, the policy learned that repeating tokens could earn a good reward, so initial CIF-like tokens appeared for a while before the output degraded into repetition. Example outputs showed many repeated lines of the same data fields, rather than a valid CIF structure. This degradation is common in LLM RL post-training. The next run will add a stronger divergence penalty and better monitoring to track raw policy outputs more reliably. More updates will follow.