Orb latent space to Tc prediction

After reading the MatterSim paper, the authors proposed the idea of using the MLFF's latent space as a direct property prediction feature set. Earlier, and I had been thinking about using a VAE (or some graph variant) to embed a material into a continuous high-dim space which we could use as a feature vector to predict properties directly.

While on the surface they sound like a very similar things, the VAE's latent space has the property of being able to reconstruct the original material. This maybe useful for comparing material to material, but when it comes to prediction tasks this approach is likely to fall short.

Using a MLFF's latent space is different because the latent space has been produced to predict downstream features already! Energies, forces and stresses (the usual outputs of DFT).

There are infinite possible latent representations a material could have; and some of these latent representations are going to be better than others.

So let's get into it.

The approach to this experiment is simple:

Use the latent features generated by the Orb model as a feature vector to train another model on a Tc prediction task (regression).

Using crystal structure and Tc from the 3DSC dataset, we have all the input and target data we need. There are caveats to the dataset as explained in the 3DSC paper, but it's the best open dataset I've come across so far:

3DSC - a dataset of superconductors including crystal structures

PDF file

Data-driven methods, in particular machine learning, can help to speed up the discovery of new materials by finding hidden patterns in existing data and using them to identify promising candidate materials. In the case of superconductors, the use of data science tools is to date slowed down by a lack of accessible data. In this work, we present a new and publicly available superconductivity dataset (‘3DSC’), featuring the critical temperature Tc of superconducting materials additionally to tested non-superconductors.

8mo

posts

posts

Orb latent space to Tc prediction

Orb latent space to Tc prediction

3DSC - a dataset of superconductors including crystal structures

1 comments

Orb Paper

Orb latent space

Orb latent space features dim-reduced and colored with Tc

Modeling

Results

Orb to Catboost Tc prediction parity plot

Out-of-distribution target prediction

Evaluating on target out-of-distribution samples

Evaluating on out-of-distribution samples

Orb latent space Tc classifier

Orb latent space exploration