Check out the paper here. It's a short read. I recommend checking it out. Although not very technical (just machine learning concepts that have been explored elsewhere), the creativity and simplicity of the approach is inspiring.
This paper presents Matra-Genoa, an autoregressive transformer model built on invertible tokenized representations of symmetrized crystals, including free coordinates. This approach enables sampling from a hybrid action space. The model is trained across the periodic table and space groups and can be conditioned on specific properties. The authors demonstrate its ability to generate stable, novel, and unique crystal structures by conditioning on the distance to the convex hull. Resulting structures are 8 times more likely to be stable than baselines using PyXtal with charge compensation, while maintaining high computational efficiency.
They mention some of the models we've explored here and callout the same shortcomings we've noticed. This is great, and it make it appear as though the approach is going to be able to build on the existing work and actually make some progress. I think it does that!
So what did they do?
My take: they created a compact genome-like representation of a crystal and trained a model to be able to generate "valid" genomes. Using a Wyckoff representation we've seen before:
lattice params (a, b, c, α, β, γ)
space group
Wyckoff positions
They encode these parameters with very few tokens.
Schematic overview of the invertible sequenced representation. (a) The structure is first decomposed into composition, stability, structure and lattice. (b) The structure is then further decomposed into a set of Wyckoff positions, uniquely identified by a set of Wyckoff identifiers. Optional free parameters are also included to make the representation coordinate-aware. (c) All previous information is gathered into a tokenenized and invertible sequence. The color of the tokens represent the type or the Wyckoff position for ease of visualization.
Taking inspiration from crystallography, we introduce an approach by considering Wyckoff positions, including free parameters. Without loss of information, any crystal structure can be described through (i) the spacegroup, (ii) a set of Wyckoff positions and corresponding chemical elements, (iii) free parameters, if required, of the Wyckoff positions, and finally (iv) the dimensions of the unit cell (a, b, c, α, β, γ). A Wyckoff position reduces a set of equivalent points (orbit) into a single point, by mapping the equivalent sites under the symmetry transformations of the given space group.
Filtering results on the 3 million generated structures (Nbatch = 150, 000). Columns include sampling temperature T, successfully optimized structures Nopt, novel inserted structures Nnovel, and stable structures meeting thresholds of 0.001, 0.050, and 0.100 eV/atom.
You can see how as temperature increases, the number of novel generations increases as well It's pretty cool to see the number of low e_hull materials not change that much though!
Overall, Matra-Genoa-MPAS has a S.U.N ratio of around 16% at T = 0.7. For comparison, MatterGen [17] reports a S.U.N ratio of ∼45%, CDVAE [16] of ∼18%, and PG-SchNet, G-SchNet [21] and FTCP [14] are below 5%, although these results are based on a much less stringent reference set (Alex-MP-ICSD) than ours.
Pretty good. We should be using the new metric that accounts for P1 symmetry and ignores those generations. Crazy how MatterGen is so high, but I think that's largely to do with the P1 frequency, which is a well known limitation.
A variety of structures are observed, some of which display novel patterns not found in the training set.
This was super exciting. The paper goes into depth on a few discovered materials that have completely new patterns not seen before. For example, there was a cubic phase of Zn6Ni7Ge2. There are only 35 other compounds with anonymous composition A2B6C7 in the convex hull of Alexandria, but none of these has cubic symmetry. This opens up new possibilities for substitution now too as we've found a new stable prototype for that ratio of atoms.
In summary, by sampling from a coarse-grained representation of Wyckoff coordinates with a deep attentionbased neural network, we demonstrated that it is possible to fully generate stable, crystal structures in an end-toend manner (i.e. including coordinates). We hope this work inspires further advancements in inverse materials design and discovery.
We're on it! Need to see if I can get access to the code and some weights now. Going to explore this model further so stay tuned.
Sharing an idea for finding "adjacent crystals". Why? In the AI research agent I'm working on, we're trying to discover materials with a set of target properties. We do this by letting an agent genera