Loading...

6mo

Exploring material classes and their latent representations

By looking at our superconducting state classifier model feature importance, we can understand what features we should be looking at. From there, we can start to study how these features change across different temperatures and materials.

Let's start looking into our CatBoost model.

PredictionValuesChange

The individual importance values for each of the input features (the default feature importances calculation method for non-ranking metrics).

For each feature, PredictionValuesChange shows how much on average the prediction changes if the feature value changes. The bigger the value of the importance the bigger on average is the change to the prediction value, if this feature is changed.

Most important features for superconducting state classifier

Image file

Catboost model 0-indexed features [0-255] and their relative importances in predicting yes/no superconducting state.

6mo

Now we have a place to start looking. This kind of feature importance doesn't give any sort of positive or negative prediction change understanding. We'll do SHAP values next because we want some understanding of what features increase or maintain the superconducting prediction compared to the ones that may reduce it.

2 comments

6mo

Author

Join to comment

SHAP Values

Rather than use a typical feature importance bar chart, we use a density scatter plot of SHAP values for each feature to identify how much impact each feature has on the model output for individuals in the dataset. Features are sorted by the sum of the SHAP value magnitudes across all samples.

Generally, it's good that we see roughly the same features here in the violin plot as we do in the feature importances generated directly by the CatBoost model.

SHAP violin plot for Tc classifier model

Image file

Shows the distribution of feature importance values across different features

6mo

SHAP values are additive contributions to the predicted probability. So for each row (feature):

Positive SHAP values (points on the right): The model is using that feature to increase the predicted probability a material is superconducting.
Negative SHAP values (points on the left): The model is using that feature to decrease the probability of superconductivity.

Looking at the color in each row tells you when that feature is high vs. low (red usually = higher underlying feature value, blue = lower), which reveals whether large or small latent values push the model to say “yes, superconducting.”

My guess is that 220 encodes something temperature related. It's roughly uniform across all ranges of SHAP and feature value, with a fatter tail to the negative SHAP value side indicating the 2:1 ratio of 0 class to 1 class we have, determined by temperature. When we start to look at individual MD runs and tracking feature movements, we'll see if we can confirm this. I would like this to be wrong! Because then it means it's a powerful feature we could potentially engineer a material with our desired behavior.

Strong signal to maximize feature values of 220, 163, 91 to increase SC probability.
Strong signal to minimize 63 and 155 features values to increase SC probability.
Lower signal to maximize features 212, 215, 54, 96, 118 to increase SC probability.

In an attempt to learn more about the features and how they impact over time, we can visualize how the SHAP values change (and because of feature vector changes) as we run through the temperature ramping molecular dynamics.

SHAP values changing during temperature ramp simulation

Image file

Watching how SHAP values change as Mo6Pb0.1Se8Yb0.9-MP-mp-1103921-synth_doped is heated from 0 K to 26 K. We notice how features 220 and 63 overcome 215 and 165 around the transition temperature.

6mo

It's more useful to download the above GIF and go frame by frame to study the effects. Notice which feature and when they switch from pushing up and pulling down on superconducting probability.

Now check out a high-Tc material and the battle that goes on in the feature space. For most of the temperature range, there's enough pressure pushing upwards and keeping P(superconducting) high, until after we reach Tc and we see the features that end up breaking the system and ending superconductivity.

High temperature superconductor SHAP values during temperature ramp simulation

Image file

An animated look at Ba2Ca2Cu3Hg0.85Re0.15O8-MP-mp-22601-synth_doped with a Tc of 132 K showing how SHAP and feature values change on its way up to Tc.

6mo

In this cuprate, 63, 220, 67, 215, 155, 165, 54, 93 push up on prediction probability until after Tc, where then 93 and 220 flip to pushing down and 91, 156, 26 grow in strength and aid in the push down.

Physical meaning of latent vectors

It's no use in understanding how our latent vectors impact predictions if we can't eventually tie them back to some physical attribute we can try to optimize. There are plenty of theories floating around that it would be cool to confirm, like the distance between copper-oxide planes affecting the critical temperature.

Upon first look at how the latent features move over temperature, we're in for a challenge. Other than some general trends over the length of heating, each feature moves pretty smoothly. Unsurprising. If our naked eye could pick up on the phase transition via these features, it would likely be much more understood. Instead, we have a push and pull of competing forces, that activate or deactivate the superconducting state.

Perhaps now this is where Magpie features will come in handy. We want to be able to learn correlations between physical features and our latent features. If we compute Magpie features for the length of the heating, we may be able to run correlations and find which latent feature corresponds to which physical feature. Actually, instead of running for the length of heating, we should just compare ground states across all of our samples, superconducting and not, so that we get a wider range of latent and Magpie features to correlate on.

So far, I haven't had much luck other than a few features with decent correlations, but nothing for our most important features. This isn't that surprising. These latent features won't really have any real, direct physical meaning, and if they did they would likely be the combination of multiple physical features (i.e like density is a function of volume and mass).

Attempting to correlate latent features to Magpie features

Image file

Only one of our latent features strongly correlated with some physical property, but other than that it was not a very successful experiment.

6mo

The next thing to try will be to directly compare two similar materials, where we know exactly what's different between them, and seeing how their latent vectors are different. This also could take us back to our original exploration of the latent space and some dim-reduction work we did.

There were very clear clusters of materials, and Tc did indeed correlate with the location of that cluster.

Looking at how some of these features map to Tc, we find that generally every feature requires a unique sweet spot. Neither maximizing nor minimizing any of these features leads to higher critical temperature.

Feature vs. feature scatter plots with Tc

Image file

Visualizing a selection of the most important latent features and how the relate to transition temperature.

6mo

153 views

0 comments

2 references

Recap of #superconductors | 2025-01-31 to 2025-02-07
post
Automated recap of the latest activity in #superconductors, created by @hermes.
6mo
Crystal structure perturbation analysis on latent space
post
In this next experiment, I'm going to try to build more of an intuition and physical understanding of some of our most important latent features, as they relate to importance of predicting the superc
6mo

posts

posts

Exploring material classes and their latent representations

Exploring material classes and their latent representations

PredictionValuesChange

Most important features for superconducting state classifier

2 comments

SHAP violin plot for Tc classifier model

SHAP values changing during temperature ramp simulation

High temperature superconductor SHAP values during temperature ramp simulation

Physical meaning of latent vectors

Attempting to correlate latent features to Magpie features

Feature vs. feature scatter plots with Tc

Recap of #superconductors | 2025-01-31 to 2025-02-07

Crystal structure perturbation analysis on latent space