Open research towards the discovery of room-temperature superconductors.
Discover ways to transform this asset
POST /speech/from-post
How this post is connected to other assets
Discover other posts like this one
So far a really interesting paper. Published in 2018. Adding some informal notes and interesting findings here. Finding out how much literature is based on this study.
This post will focus on the methods available to predict/derive of a material. We want to be able to build a pipeline where we can go beyond the available (and experimental) Tc data and train a model
After reading the MatterSim paper, the authors proposed the idea of using the MLFF's latent space as a direct property prediction feature set. Earlier, and I had been thinking about using a VAE (or s
Careful evaluation of the classifier model is important so that we can truly understand the capabilities and performance of a Tc predicting model. Particularly important to us is the ability for the m
Over the last week or so, I've been working on making some upgrades to the superconducting state classifier model.
See the first attempt here:
Using what we learned when trying to use the MLFF's latent space for Tc prediction, there's a way we can simplify things for the prediction model and give it a better change of picking up on the signa
Learning from the first attempt, one of the most important updates made was to the dataset we train on.
The first round we didn't use enough time steps when doing the temperature ramping, we were missing periodic boundary conditions and our supercell was likely too small, and we used a suboptimal MLFF model that didn't have D3 corrections.
In this updated model (and dataset), we've changed the following:
Increased the temperature range from 0 K to to now .
Dynamic number of steps to run the simulation for
Instead of 2000 total steps regardless of temperature range, we've made it 30 steps per K increase so we have a fixed number of steps per K increase.
This will help fill in more data points for higher-Tc materials
Using supercell size of 3x3x3 instead of 2x2x2
Enabled periodic boundary conditions for more accurate crystal simulations
Using Orb model with D3 corrections
Simply a more accurate model with Van der Waals force corrections (D3 corrections)
Increased frame sampling to every 30 steps (every 1 K), up from every 100 steps which represented a variable temperature increase
Because of the increased number of steps and supercell size, the dataset generation took ~5 days of T4 GPU time at 100% utilization. Pretty brutal, but it should be worth it.
These changes also gave us ~160,000 (up from ~60,000) datapoints to train on, with more datapoints around higher transition temperatures.
In the previous dataset, we had roughly a 2:1 ratio of 0 to 1 superconducting class. Once again, the balance is around 2:1.
Training data shape: (126976, 256)
Class distribution: [83063 43913]
Validation data shape: (32640, 256)
Validation class distribution: [20759 11881]
Training convergence:
Train Loss: 0.2282, Train Acc: 89.49%, Train AUC: 0.9639
Val Loss: 0.3480, Val Acc: 87.08%, Val AUC: 0.9367
Eval scripts are still running, and because of the increase supercell sizes and increased number of steps, things take a little longer to evaluate, but not terrible. Roughly 1 to 4 minutes to evaluate a single material but we can parallelize pretty well.
Overall, looking like a minor but clear improvement over the last attempt. Predictions are closer to actuals, and the transitions from superconductivity to non-superconductivity are getting sharper.
Here are some snapshots from the evaluation set that looked pretty good. The model has not seen any of these materials.
Spot on prediction for Ca1Cu2Nd0.16Pb0.5Sr1.84Tl0.5O7-MP-mp-1173221-synth_doped with an error of just 1.4 K at 103.4 K.
There are still some really strange temperature-related effects. See around 5 to 20 K. It's not present in most samples and so we'd assume it's something material specific. Perhaps it's related to number of atoms or something more fundamental.
For some of these higher-Tc materials, we do still see this kind of S-curve around the transition temperature, instead of an L-curve. Though, it seems the temperature range of the fuzziness between classifications is getting smaller.
While underestimating, Ba2Ca2Cu3Tl2O10-MP-mp-1228620 has a pretty decent phase change which relatively high certainties on both sizes of Tc.
We'll have more specific numbers when the evaluation is finished, but in general it seems we are underestimating Tc slightly more frequently, whereas before we had a tendency to overestimate especially when it came to high-Tc materials. This is a good change as it's better to be conservative with Tc prediction.
Interestingly, it seems this S-curve shape we often see matches up well with experimental measurements of resistivity, where there is a 10 to 20 K range in which resistance falls to 0:
Screenshot of measured resistance of a YBCO sample, from https://www.depts.ttu.edu/phas/cees/Instruction/PHYS_5300-19/Activities/YBCO_superconducting.pdf
I'm currently testing adding some new features to the dataset. The 256 we get from the MLFF latent space are very powerful, but perhaps there are additions we can make to improve performance. Ideally, we want features that are independent or normalized with respect to system size so that supercell and number of atoms don't change how we understand the material.
Trying things like:
volume_per_atom
density
mass_per_atom
potential_energy_per_atom
total_energy_per_atom
degrees_of_freedom_per_atom
I'm aware of the Magpie feature set and may explore adding those if we find some signal in this first test.
UPDATE: I'd say they didn't have any real effect. They added a ton of time to the training process as calculating some of these properties is non-trivial.