For the past few weeks I've been grinding away on some hand rolled (Claude-rolled) diffusion models for crystal structure generation. I've always wanted to sit down and spend some time with these models to learn more about how they work. This idea of going from a 'noisy' representation of something and resolving it to a concrete image, idea, or object, has always felt somewhat analogous to how humans resolve speech, or actions from thoughts. So from a modeling perspective, I've always been a diffusion believer.
Having been hung up on trying to find consistent ways to translate from plain English to a valid structure for a while now, this felt like the perfect application for some diffusion experiments, (plugging this great video on the operating mechanisms of these models: https://youtu.be/Fk2I6pa6UeA?si=zLvtNk-OrEnHPg6n).
Microsoft has tried this already with their MatterGen model:
MatterGen employs a diffusion-based approach for crystal structure generation, utilizing classifier-free guidance to steer the generation process. The core of our modifications centers on the Property
On this page
I feel like the approach was flawed for two main reasons:
Injecting physical rules as constraints rather than learnable features hamstrings the model (this is more of a bitter lesson style personal opinion).
Modeling <20 atom systems means single atom 'mistakes' are catastrophic for the system at large.
Given that the frontier image models these days exhibit remarkable detail resolution with even the simplest of input prompts, I went into these experiments expecting that the features of any given structure were learnable with zero physical guidance, or penalties/ constraints around 'invalid' structures. Another glaring gap that I've seen from these crystal gen models (outside of just MatterGen), is this almost soft cap of 20 atom systems, often times resulting in the output for a given inference pass being a singular unit cell. I have never anything nearly as complex as the NdFeB unit cell come from one of these models:
So in addition to the lack of physical guidance, I wanted to model supercells for these experiments. In my mind, modeling supercells helps with a couple things:
The model learns to generate structures with hundreds of atoms, opening the door for more single crystal complexity.
More atoms makes a crystal inherently 'error-tolerant'. In an 8 atom unit cell, a single misplaced or misclassified atom could destabilize the entire system. Scale that point failure to a system of 80-100 atoms, that 'failure' has the ability to not only not destabilize the system, but also potentially introduce interesting new properties as a dopant of sorts.
For the actual modeling experiments, I kept a running experiment log from one iteration to the next and dumped it all into OpenAI's new Latex editor: https://prism.openai.com
It's a cool product (but they might take a cut of your discoveries if you use it, so be vigilant) - I heard that from some random potentially fake news outlet on Twitter.
In terms of next steps, I feel like we've really nailed down learning the structural component of these systems. Classifying the atoms at the learned sites is the hard part. I'll probably be going back to the drawing board here, and trying to learn more about how these conditioning mechanisms work in other frontier models. I've also been hearing a lot about 'flow' versus diffusion, so that'll be another research avenue.
Wanted to round this out with a visual showing the denoising process for one of the cooler structures that came out of this exercise:
More to come as always.
Experiments with diffusion models to generate crystal structures, moving from noisy representations to concrete atomic arrangements. They describe learning how these models can learn structure without strict physical rules, and compare approaches that rely on fixed constraints to ones that let the model discover valid layouts. The author notes limitations in existing crystal generators, such as only producing tiny unit cells and struggling with complex, multi-atom systems like NdFeB. To address this, they explore modeling larger supercells with hundreds of atoms to improve detail and tolerance to errors, potentially revealing new properties through dopants. They keep a running experiment log in an AI notebook and plan to explore conditioning methods and the difference between flow and diffusion approaches in future work.