Open research towards the discovery of room-temperature superconductors.
Discover other posts like this one
So far a really interesting paper. Published in 2018. Adding some informal notes and interesting findings here. Finding out how much literature is based on this study.
The framework highlights 35 compounds with predicted Tc’s above 20 K for experimental validation. Of these, some exhibit interesting chemical and structural similarities to cuprate superconductors, demonstrating the ability of the ML models to identify meaningful patterns in the data. In addition, most materials from the list share a peculiar feature in their electronic band structure: one (or more) flat/nearly-flat bands just below the energy of the highest occupied electronic state. The associated large peak in the density of states (infinitely large in the limit of truly flat bands) can lead to strong electronic instability, and has been discussed recently as one possible way to high-temperature superconductivity.33,34
we develop several ML methods modeling Tc from the complete list of reported (inorganic) superconductors.18 In their simplest form, these methods take as input a number of predictors generated from the elemental composition of each material. Models developed with these basic features are surprisingly accurate, despite lacking information of relevant properties, such as space group, electronic structure, and phonon energies. To further improve the predictive power of the models, as well as the ability to extract useful information out of them, another set of features are constructed based on crystallographic and electronic information taken from the AFLOW Online Repositories.
Random forest models, but no mention of gradient boosting models like XGBoost or CatBoost 💀. With better modeling, perhaps we could squeeze out better performance that matters.
Once we have a list of relevant predictors, various ML models can be applied to the data.51,52 All ML algorithms in this work are variants of the random forest method.53
After reading more of the literature, I've found that this paper was really the starting point for ML prediction of Tc, and since the release in 2018, a number of groups have improved both the dataset and modeling (including XGB and GNNs).
Through feature importance and interpretation, we can perhaps learn the mechanisms behind the superconductivity. While some feature correlated to does not imply causation, it does give us insight into further research.
Differences in important predictors across the family-specific models reflect the fact that distinct mechanisms are responsible for driving superconductivity among these groups. The list is longest for the low-Tc superconductors, reflecting the eclectic nature of this group. Similar to the general regression model, different branches are likely created for distinct sub-groups. Nevertheless, some important predictors have straightforward interpretation. As illustrated in Fig. 5a, low average atomic weight is a necessary (albeit not sufficient) condition for achieving high Tc among the low-Tc group. In fact, the maximum Tc for a given weight roughly follows . Mass plays a significant role in conventional superconductors through the Debye frequency of phonons, leading to the well-known formula , where is the ionic mass (see, for example, refs. 56,57,58). Other factors like density of states are also important, which explains the spread in Tc for a given . Outlier materials clearly above the line include bismuthates and chloronitrates, suggesting the conventional electron-phonon mechanism is not driving superconductivity in these materials. Indeed, chloronitrates exhibit a very weak isotope effect,59 though some unconventional electron-phonon coupling could still be relevant for superconductivity.60 Another important feature for low-Tc materials is the average number of valence electrons. This recovers the empirical relation first discovered by Matthias more than 60 years ago.61 Such findings validate the ability of ML approaches to discover meaningful patterns that encode true physical phenomena.
Whereas previous investigations explored several hundred compounds at most, this work considers >16,000 different compositions. These are extracted from the SuperCon database, which contains an exhaustive list of superconductors, including many closely related materials varying only by small changes in stoichiometry (doping plays a significant role in optimizing Tc).
Added negative samples to the training data to try and learn about the features that prevent superconductivity:
training a model only on superconductors can lead to significant selection bias that may render it ineffective when applied to new materials (N.B., a model suffering from selection bias can still provide valuable statistical information about known superconductors). Even if the model learns to correctly recognize factors promoting superconductivity, it may miss effects that strongly inhibit it. To mitigate the effect, we incorporate about 300 materials found by H. Hosono’s group not to display superconductivity.35
Learning more about academia/research, it's interesting to find notes like this. Industry definitely has this too, but I think there's much more emphasis on letting the model do the work that gives us an advantage. In DS, feature engineering and feature selection are core to the work, but it seems more of an afterthought in the materials world (broad generalization).
Large sets of independent variables can be constructed and rigorously filtered by predictive power (rather than selecting them by intuition alone). These advances are crucial to uncovering insights into the emergence/suppression of superconductivity with composition.