On this page
Using a 3DSC published superconductor dataset we fine-tuned MatterGen to enable critical temperature property conditioned generation of 'S.U.N' crystal structures.
The 3DSC dataset was intentionally deduped but knowing that re-entrant superconductors are a known class of materials it could be interesting to fine-tune on the full merge between 3DSC and the alex_mp_20 set.
The fine-tuning job was run on 4 A10 GPU's with the trainer accumulated_grad_batches param reduced to 2 from 4. We tried kicking this off on M-series Mac machines given the recent improvement to the repo that enabled metal hardware support, but the training times were way too high. Single GPU training experiments led us to attempt to reduce the floating point precision down to 16 but this came with too many numerical instability issues.
Below you'll find the trainer config for our fine-tuning run:
We let the fine-tune run for the maximum 200 epochs defined by the MatterGen provided script and generated 15 new candidate structures with a condition defined Tc of 298.15 Kelvin (as in we told the model generate ones that have that Tc, this isn't saying these structures do have this Tc... they likely don't).
Here they are:
gen_0.cif
gen_1.cif
gen_9.cif
gen_11.cif
And here are some high level attributes for each:
The model's 'uniqueness' bias helps us out a lot here. We're seeing chemical systems that are unique within their own batch, meaning we can quickly search for promising chemical systems and refine our search with chemical system conditioned generation. We're also seeing coordination numbers for common and relatively simple crystal structures: simple cubic (6), BCC (8), and FCC (12). This is reassuring as, despite tuning on very sparse data, we're still seeing structure output that reflects structures we know to exist in the physical world.
As for the evaluation metrics for these generated structures, there's a lot to be optimistic about:
{
"avg_energy_above_hull_per_atom": {
"value": 0.04149956944731237,
"description": "Average energy above hull per atom (eV/atom) of structures in sampled data."
},
"avg_rmsd_from_relaxation": {
"value": 0.06419174912084613,
"description": "root mean square displacements of atoms (Angstrom) from initial to final DFT relaxation steps in sampled data."
},
"frac_novel_unique_stable_structures": {
"value": 1.0,
"description": "Fraction of novel unique stable structures in sampled data within 0.1 (eV/atom) above convex hull of MP2020correction."
},
"frac_stable_structures": {
"value": 1.0,
"description": "Fraction of stable structures in sampled data within 0.1 (eV/atom) above convex hull of MP2020correction."
},
"frac_successful_jobs": {
"value": 1.0,
"description": "Fraction of structures whose jobs ran successfully."
},
"avg_comp_validity": {
"value": 1.0,
"description": "Average composition validity (according to smact) of structures in sampled data."
},
"avg_structure_comp_validity": {
"value": 1.0,
"description": "Average number of structures in sampled data that are both valid structures and have a valid smact compositions."
},
"avg_structure_validity": {
"value": 1.0,
"description": "Average structural validity of structures in sampled data. Any atom-atom distances less than 0.5 Angstroms or a volume less than 0.1 Angstrom**3 are considered invalid ."
},
"frac_novel_structures": {
"value": 1.0,
"description": "Fraction of novel structures in sampled data."
},
"frac_novel_systems": {
"value": 0.0,
"description": "Fraction of distinct chemical systems in sampled data and not in MP2020correction."
},
"frac_novel_unique_structures": {
"value": 1.0,
"description": "Fraction of novel unique structures in sampled data."
},
"frac_unique_structures": {
"value": 1.0,
"description": "Fraction of unique structures in sampled data."
},
"frac_unique_systems": {
"value": 1.0,
"description": "Fraction of structures in sampled data that have a unique chemical system within this set."
},
"precision": {
"value": 0.0,
"description": "Precision of structures in sampled data compared with MP2020correction. This is the fraction of structures in sampled data that have a matching structure in MP2020correction."
},
"recall": {
"value": 0.0,
"description": "Recall of structures in sampled data compared with structures in MP2020correction. This is the fraction of structures in sampled data that have a matching structure in MP2020correction."
}
}
What I find to be particularly exciting is that Yttrium Barium Copper Oxide (YBCO) is known to be a high temperature superconductor. MatterGen is calling out, in the gen_4.cif, that YBSO could be a candidate as well.
Handing it over to you for some classification.
Building on 's work with the fine-tuned MatterGen model, we evaluated 400 candidate materials for superconductivity using the Tc classification model. Prior work on fine-tuning: