Ouro
  • Docs
  • Blog
Join for freeSign in
  • Teams
  • Search
Assets
  • Quests
  • Posts
  • APIs
  • Data
  • Teams
  • Search
Assets
  • Quests
  • Posts
  • APIs
  • Data
1mo

Notes as I read about Matra-Genoa, a new crystal generation model

Check out the paper here. It's a short read. I recommend checking it out. Although not very technical (just machine learning concepts that have been explored elsewhere), the creativity and simplicity of the approach is inspiring.

A generative material transformer using Wyckoff representation

PDF file

This paper presents Matra-Genoa, an autoregressive transformer model built on invertible tokenized representations of symmetrized crystals, including free coordinates. This approach enables sampling from a hybrid action space. The model is trained across the periodic table and space groups and can be conditioned on specific properties. The authors demonstrate its ability to generate stable, novel, and unique crystal structures by conditioning on the distance to the convex hull. Resulting structures are 8 times more likely to be stable than baselines using PyXtal with charge compensation, while maintaining high computational efficiency.

1mo

They mention some of the models we've explored here and callout the same shortcomings we've noticed. This is great, and it make it appear as though the approach is going to be able to build on the existing work and actually make some progress. I think it does that!

So what did they do?

My take: they created a compact genome-like representation of a crystal and trained a model to be able to generate "valid" genomes. Using a Wyckoff representation we've seen before:

  • lattice params (a, b, c, α, β, γ)

  • space group

  • Wyckoff positions

They encode these parameters with very few tokens.

Figure 1 from "A generative material transformer using Wyckoff representation"

Image file

Schematic overview of the invertible sequenced representation. (a) The structure is first decomposed into composition, stability, structure and lattice. (b) The structure is then further decomposed into a set of Wyckoff positions, uniquely identified by a set of Wyckoff identifiers. Optional free parameters are also included to make the representation coordinate-aware. (c) All previous information is gathered into a tokenenized and invertible sequence. The color of the tokens represent the type or the Wyckoff position for ease of visualization.

1mo

Taking inspiration from crystallography, we introduce an approach by considering Wyckoff positions, including free parameters. Without loss of information, any crystal structure can be described through (i) the spacegroup, (ii) a set of Wyckoff positions and corresponding chemical elements, (iii) free parameters, if required, of the Wyckoff positions, and finally (iv) the dimensions of the unit cell (a, b, c, α, β, γ). A Wyckoff position reduces a set of equivalent points (orbit) into a single point, by mapping the equivalent sites under the symmetry transformations of the given space group.

Table 1 from "A generative material transformer using Wyckoff representation"

Image file

Filtering results on the 3 million generated structures (Nbatch = 150, 000). Columns include sampling temperature T, successfully optimized structures Nopt, novel inserted structures Nnovel, and stable structures meeting thresholds of 0.001, 0.050, and 0.100 eV/atom.

1mo

You can see how as temperature increases, the number of novel generations increases as well It's pretty cool to see the number of low e_hull materials not change that much though!

Overall, Matra-Genoa-MPAS has a S.U.N ratio of around 16% at T = 0.7. For comparison, MatterGen [17] reports a S.U.N ratio of ∼45%, CDVAE [16] of ∼18%, and PG-SchNet, G-SchNet [21] and FTCP [14] are below 5%, although these results are based on a much less stringent reference set (Alex-MP-ICSD) than ours.

Pretty good. We should be using the new metric that accounts for P1 symmetry and ignores those generations. Crazy how MatterGen is so high, but I think that's largely to do with the P1 frequency, which is a well known limitation.

A variety of structures are observed, some of which display novel patterns not found in the training set.

This was super exciting. The paper goes into depth on a few discovered materials that have completely new patterns not seen before. For example, there was a cubic phase of Zn6Ni7Ge2. There are only 35 other compounds with anonymous composition A2B6C7 in the convex hull of Alexandria, but none of these has cubic symmetry. This opens up new possibilities for substitution now too as we've found a new stable prototype for that ratio of atoms.

In summary, by sampling from a coarse-grained representation of Wyckoff coordinates with a deep attentionbased neural network, we demonstrated that it is possible to fully generate stable, crystal structures in an end-toend manner (i.e. including coordinates). We hope this work inspires further advancements in inverse materials design and discovery.

We're on it! Need to see if I can get access to the code and some weights now. Going to explore this model further so stay tuned.

Loading comments...
95 views

On this page

  • Notes as I read about Matra-Genoa, a new crystal generation model
Loading compatible actions...
    1 reference
    • Finding adjacent crystals in Matra-Genoa's latent space

      post

      This post explores ideas for finding adjacent crystals in Matra-Genoa’s latent space to discover materials with targeted properties. The author describes challenges when mutating crystals, where small input changes can lead to large, different outputs after relaxation. Three approaches are considered: conditioned generation with token hints (fixing some inputs while mutating others), decoding from a modified latent space (using predictors and SHAP to steer latent directions before decoding), and a hybrid approach that combines fixed tokens with latent-space moves. The goal is faster exploration and smarter guidance from an AI research agent and a language model, reducing the cost of property evaluation. The notes also touch on fine-tuning and property-focused training to improve material design workflows. Keywords: adjacent crystals, latent space, Matra-Genoa, crystal generation, materials AI, property optimization.

      1mo