Assets

Notes as I read about Matra-Genoa, a new crystal generation model

Check out the paper here. It's a short read. I recommend checking it out. Although not very technical (just machine learning concepts that have been explored elsewhere), the creativity and simplicity of the approach is inspiring.

PDF file

This paper presents Matra-Genoa, an autoregressive transformer model built on invertible tokenized representations of symmetrized crystals, including free coordinates. This approach enables sampling from a hybrid action space. The model is trained across the periodic table and space groups and can be conditioned on specific properties. The authors demonstrate its ability to generate stable, novel, and unique crystal structures by conditioning on the distance to the convex hull. Resulting structures are 8 times more likely to be stable than baselines using PyXtal with charge compensation, while maintaining high computational efficiency.

3mo

Loading comments...

On this page

Notes as I read about Matra-Genoa, a new crystal generation model

Loading compatible actions...

They mention some of the models we've explored here and callout the same shortcomings we've noticed. This is great, and it make it appear as though the approach is going to be able to build on the existing work and actually make some progress. I think it does that!

So what did they do?

My take: they created a compact genome-like representation of a crystal and trained a model to be able to generate "valid" genomes. Using a Wyckoff representation we've seen before:

lattice params (a, b, c, α, β, γ)
space group
Wyckoff positions

They encode these parameters with very few tokens.

Figure 1 from "A generative material transformer using Wyckoff representation"

Taking inspiration from crystallography, we introduce an approach by considering Wyckoff positions, including free parameters. Without loss of information, any crystal structure can be described through (i) the spacegroup, (ii) a set of Wyckoff positions and corresponding chemical elements, (iii) free parameters, if required, of the Wyckoff positions, and finally (iv) the dimensions of the unit cell (a, b, c, α, β, γ). A Wyckoff position reduces a set of equivalent points (orbit) into a single point, by mapping the equivalent sites under the symmetry transformations of the given space group.

Table 1 from "A generative material transformer using Wyckoff representation"

You can see how as temperature increases, the number of novel generations increases as well It's pretty cool to see the number of low e_hull materials not change that much though!

Overall, Matra-Genoa-MPAS has a S.U.N ratio of around 16% at T = 0.7. For comparison, MatterGen [17] reports a S.U.N ratio of ∼45%, CDVAE [16] of ∼18%, and PG-SchNet, G-SchNet [21] and FTCP [14] are below 5%, although these results are based on a much less stringent reference set (Alex-MP-ICSD) than ours.

Pretty good. We should be using the new metric that accounts for P1 symmetry and ignores those generations. Crazy how MatterGen is so high, but I think that's largely to do with the P1 frequency, which is a well known limitation.

A variety of structures are observed, some of which display novel patterns not found in the training set.

This was super exciting. The paper goes into depth on a few discovered materials that have completely new patterns not seen before. For example, there was a cubic phase of Zn6Ni7Ge2. There are only 35 other compounds with anonymous composition A2B6C7 in the convex hull of Alexandria, but none of these has cubic symmetry. This opens up new possibilities for substitution now too as we've found a new stable prototype for that ratio of atoms.

In summary, by sampling from a coarse-grained representation of Wyckoff coordinates with a deep attentionbased neural network, we demonstrated that it is possible to fully generate stable, crystal structures in an end-toend manner (i.e. including coordinates). We hope this work inspires further advancements in inverse materials design and discovery.

We're on it! Need to see if I can get access to the code and some weights now. Going to explore this model further so stay tuned.

106 views

1 reference

post
This post explores ideas for finding adjacent crystals in Matra-Genoa’s latent space to discover materials with targeted properties. The author describes challenges when mutating crystals, where small input changes can lead to large, different outputs after relaxation. Three approaches are considered: conditioned generation with token hints (fixing some inputs while mutating others), decoding from a modified latent space (using predictors and SHAP to steer latent directions before decoding), and a hybrid approach that combines fixed tokens with latent-space moves. The goal is faster exploration and smarter guidance from an AI research agent and a language model, reducing the cost of property evaluation. The notes also touch on fine-tuning and property-focused training to improve material design workflows. Keywords: adjacent crystals, latent space, Matra-Genoa, crystal generation, materials AI, property optimization.
3mo

posts

posts

Notes as I read about Matra-Genoa, a new crystal generation model

posts

posts

Notes as I read about Matra-Genoa, a new crystal generation model

A generative material transformer using Wyckoff representation

Figure 1 from "A generative material transformer using Wyckoff representation"

Table 1 from "A generative material transformer using Wyckoff representation"

Finding adjacent crystals in Matra-Genoa's latent space