Tc prediction model updates · Posts on Ouro

Tc prediction model updates

Over the last week or so, I've been working on making some upgrades to the superconducting state classifier model.

See the first attempt here:

Orb latent space Tc classifier

post

Using what we learned when trying to use the MLFF's latent space for Tc prediction, there's a way we can simplify things for the prediction model and give it a better change of picking up on the signa

5mo

Learning from the first attempt, one of the most important updates made was to the dataset we train on.

The first round we didn't use enough time steps when doing the temperature ramping, we were missing periodic boundary conditions and our supercell was likely too small, and we used a suboptimal MLFF model that didn't have D3 corrections.

In this updated model (and dataset), we've changed the following:

Increased the temperature range from 0 K to $true_{Tc} + 10$ to now $+ 20$ .
Dynamic number of steps to run the simulation for
- Instead of 2000 total steps regardless of temperature range, we've made it 30 steps per K increase so we have a fixed number of steps per K increase.
- This will help fill in more data points for higher-Tc materials
Using supercell size of 3x3x3 instead of 2x2x2
Enabled periodic boundary conditions for more accurate crystal simulations
Using Orb model with D3 corrections
- Simply a more accurate model with Van der Waals force corrections (D3 corrections)
Increased frame sampling to every 30 steps (every 1 K), up from every 100 steps which represented a variable temperature increase

Because of the increased number of steps and supercell size, the dataset generation took ~5 days of T4 GPU time at 100% utilization. Pretty brutal, but it should be worth it.

These changes also gave us ~160,000 (up from ~60,000) datapoints to train on, with more datapoints around higher transition temperatures.

In the previous dataset, we had roughly a 2:1 ratio of 0 to 1 superconducting class. Once again, the balance is around 2:1.

plaintext

Training data shape: (126976, 256)
Class distribution: [83063 43913]

Validation data shape: (32640, 256)
Validation class distribution: [20759 11881]

Training convergence:

plaintext

Train Loss: 0.2282, Train Acc: 89.49%, Train AUC: 0.9639
Val Loss: 0.3480, Val Acc: 87.08%, Val AUC: 0.9367

Early results

Eval scripts are still running, and because of the increase supercell sizes and increased number of steps, things take a little longer to evaluate, but not terrible. Roughly 1 to 4 minutes to evaluate a single material but we can parallelize pretty well.

Overall, looking like a minor but clear improvement over the last attempt. Predictions are closer to actuals, and the transitions from superconductivity to non-superconductivity are getting sharper.

Here are some snapshots from the evaluation set that looked pretty good. The model has not seen any of these materials.

Tc prediction with the new D3 corrected model

Image file

Spot on prediction for Ca1Cu2Nd0.16Pb0.5Sr1.84Tl0.5O7-MP-mp-1173221-synth_doped with an error of just 1.4 K at 103.4 K.

5mo

There are still some really strange temperature-related effects. See around 5 to 20 K. It's not present in most samples and so we'd assume it's something material specific. Perhaps it's related to number of atoms or something more fundamental.

For some of these higher-Tc materials, we do still see this kind of S-curve around the transition temperature, instead of an L-curve. Though, it seems the temperature range of the fuzziness between classifications is getting smaller.

Clean transition in predictions for superconductivity

Image file

While underestimating, Ba2Ca2Cu3Tl2O10-MP-mp-1228620 has a pretty decent phase change which relatively high certainties on both sizes of Tc.

5mo

We'll have more specific numbers when the evaluation is finished, but in general it seems we are underestimating Tc slightly more frequently, whereas before we had a tendency to overestimate especially when it came to high-Tc materials. This is a good change as it's better to be conservative with Tc prediction.

Interestingly, it seems this S-curve shape we often see matches up well with experimental measurements of resistivity, where there is a 10 to 20 K range in which resistance falls to 0:

Lab measured resistance and temperature plot of YBCO superconductor

Image file

Screenshot of measured resistance of a YBCO sample, from https://www.depts.ttu.edu/phas/cees/Instruction/PHYS_5300-19/Activities/YBCO_superconducting.pdf

5mo

Current work

I'm currently testing adding some new features to the dataset. The 256 we get from the MLFF latent space are very powerful, but perhaps there are additions we can make to improve performance. Ideally, we want features that are independent or normalized with respect to system size so that supercell and number of atoms don't change how we understand the material.

Trying things like:

volume_per_atom
density
mass_per_atom
potential_energy_per_atom
total_energy_per_atom
degrees_of_freedom_per_atom

I'm aware of the Magpie feature set and may explore adding those if we find some signal in this first test.

UPDATE: I'd say they didn't have any real effect. They added a ton of time to the training process as calculating some of these properties is non-trivial.

0 comments

Join to comment

posts

Tc prediction model updates

Tc prediction model updates

Orb latent space Tc classifier

Early results

Tc prediction with the new D3 corrected model

Clean transition in predictions for superconductivity

Lab measured resistance and temperature plot of YBCO superconductor

Current work

0 comments

Recap of #superconductors | 2025-01-31 to 2025-02-07