For the past few weeks I've been grinding away on some hand rolled (Claude-rolled) diffusion models for crystal structure generation. I've always wanted to sit down and spend some time with these models to learn more about how they work. This idea of going from a 'noisy' representation of something and resolving it to a concrete image, idea, or object, has always felt somewhat analogous to how humans resolve speech, or actions from thoughts. So from a modeling perspective, I've always been a diffusion believer.
Having been hung up on trying to find consistent ways to translate from plain English to a valid structure for a while now, this felt like the perfect application for some diffusion experiments, (plugging this great video on the operating mechanisms of these models: https://youtu.be/Fk2I6pa6UeA?si=zLvtNk-OrEnHPg6n).
Microsoft has tried this already with their MatterGen model:
MatterGen employs a diffusion-based approach for crystal structure generation, utilizing classifier-free guidance to steer the generation process. The core of our modifications centers on the Property
Phase diagram of Gd10Mg9Sn5; eabovehull: 0.030306 eV/atom; predicted_stable: False
Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -180.5889 eV; energy change = -45.8368 eV; symmetry: P1 → Pm
Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -27.9508 eV; energy change = -1.1675 eV; symmetry: Cm → I-4m2
Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -63.9388 eV; energy change = -7.7039 eV; symmetry: Pm → Pm
Crystal structure for Fe6CoSi generated by GPSK-300 (3-channel reciprocal-space DiT). 8 sites, min distance 1.756A, selected from 14 candidates.
Phase diagram of FeCoNiPt; eabovehull: 0.000000 eV/atom; predicted_stable: True
Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -27.9508 eV; energy change = -1.1675 eV; symmetry: Cm → I-4m2
Crystal structure for FeCoNiPt generated by GPSK-300 (3-channel reciprocal-space DiT). 4 sites, min distance 2.363A, selected from 3 candidates.
Generate novel crystal structures with multimodal property conditioning using a reciprocal-space DiT. Returns CIF data.
Multimodal DiT for inorganic crystal generation. Conditions on composition, crystal system, space group, band gap, formation energy, e-above-hull, and magnetic ordering.
Supercell 2x2x2 of Tm3Ru (Space group: P-1, 16 symmetry operations)
Phase diagram of Tm3Ru; eabovehull: 0.097735 eV/atom; predicted_stable: False
Phase diagram of YLuAl2; eabovehull: 0.019939 eV/atom; predicted_stable: True
Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -47.6003 eV; energy change = -0.1171 eV; symmetry: P-1 → P-1
Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -80.1591 eV; energy change = -0.7698 eV; symmetry: P42/mmc → P42/mmc
Phase diagram of Ba2SnHg3; eabovehull: 0.100615 eV/atom; predicted_stable: False
Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -136.2836 eV; energy change = -10.2749 eV; symmetry: P1 → P1
I feel like the approach was flawed for two main reasons:
Injecting physical rules as constraints rather than learnable features hamstrings the model (this is more of a bitter lesson style personal opinion).
Modeling <20 atom systems means single atom 'mistakes' are catastrophic for the system at large.
Given that the frontier image models these days exhibit remarkable detail resolution with even the simplest of input prompts, I went into these experiments expecting that the features of any given structure were learnable with zero physical guidance, or penalties/ constraints around 'invalid' structures. Another glaring gap that I've seen from these crystal gen models (outside of just MatterGen), is this almost soft cap of 20 atom systems, often times resulting in the output for a given inference pass being a singular unit cell. I have never anything nearly as complex as the NdFeB unit cell come from one of these models:
So in addition to the lack of physical guidance, I wanted to model supercells for these experiments. In my mind, modeling supercells helps with a couple things:
The model learns to generate structures with hundreds of atoms, opening the door for more single crystal complexity.
More atoms makes a crystal inherently 'error-tolerant'. In an 8 atom unit cell, a single misplaced or misclassified atom could destabilize the entire system. Scale that point failure to a system of 80-100 atoms, that 'failure' has the ability to not only not destabilize the system, but also potentially introduce interesting new properties as a dopant of sorts.
For the actual modeling experiments, I kept a running experiment log from one iteration to the next and dumped it all into OpenAI's new Latex editor: https://prism.openai.com
It's a cool product (but they might take a cut of your discoveries if you use it, so be vigilant) - I heard that from some random potentially fake news outlet on Twitter.
In terms of next steps, I feel like we've really nailed down learning the structural component of these systems. Classifying the atoms at the learned sites is the hard part. I'll probably be going back to the drawing board here, and trying to learn more about how these conditioning mechanisms work in other frontier models. I've also been hearing a lot about 'flow' versus diffusion, so that'll be another research avenue.
Wanted to round this out with a visual showing the denoising process for one of the cooler structures that came out of this exercise:
More to come as always.