Data-driven methods, in particular machine learning, can help to speed up the discovery of new materials by finding hidden patterns in existing data and using them to identify promising candidate materials. In the case of superconductors, the use of data science tools is to date slowed down by a lack of accessible data. In this work, we present a new and publicly available superconductivity dataset (‘3DSC’), featuring the critical temperature Tc of superconducting materials additionally to tested non-superconductors.
How this file is connected to other assets
Careful evaluation of the classifier model is important so that we can truly understand the capabilities and performance of a Tc predicting model. Particularly important to us is the ability for the m
After reading the MatterSim paper, the authors proposed the idea of using the MLFF's latent space as a direct property prediction feature set. Earlier, and I had been thinking about using a VAE (or s
Literature review of existing studies done on predicting with machine learning.
Literature review of databases with materials and . See literature review on ML models which utilize these datasets:
Discover other files like this one
Superconductivity has been the focus of enormous research effort since its discovery more than a century ago. Yet, some features of this unique phenomenon remain poorly understood; prime among these is the connection between superconductivity and chemical/structural properties of materials. To bridge the gap, several machine learning schemes are developed herein to model the critical temperatures (Tc) of the 12,000+ known superconductors available via the SuperCon database. Materials are first divided into two classes based on their Tc values, above and below 10 K, and a classification model predicting this label is trained. The model uses coarse-grained features based only on the chemical compositions. It shows strong predictive power, with out-of-sample accuracy of about 92%. https://www.nature.com/articles/s41524-018-0085-8
a) Histogram of materials categorized by Tc (bin size is 2 K, only those with finite Tc are counted). Blue, green, and red denote low-Tc, iron-based, and cuprate superconductors, respectively. In the inset: histogram of materials categorized by ln(Tc) restricted to those with Tc > 10 K. b) Performance of different classification models as a function of the threshold temperature (Tsep) that separates materials in two classes by Tc. Performance is measured by accuracy (gray), precision (red), recall (blue), and F1 score (purple). The scores are calculated from predictions on an independent test set, i.e., one separate from the dataset used to train the model. In the inset: the dashed red curve gives the proportion of materials in the above-Tsep set. c) Accuracy, precision, recall, and F1 score as a function of the size of the training set with a fixed test set. d) Accuracy, precision, recall, and F1 as a function of the number of predictors
This study employs the SuperCon dataset as the largest superconducting materials dataset. Then, we perform various data pre-processing steps to derive the clean DataG dataset, containing 13,022 compounds. In another stage of the study, we apply the novel CatBoost algorithm to predict the transition temperatures of novel superconducting materials. In addition, we developed a package called Jabir, which generates 322 atomic descriptors. We also designed an innovative hybrid method called the Soraya package to select the most critical features from the feature space. These yield R2 and RMSE values (0.952 and 6.45 K, respectively) superior to those previously reported in the literature. Finally, as a novel contribution to the field, a web application was designed for predicting and determining the Tc values of superconducting materials.
Since the announcement in 2011 of the Materials Genome Initiative by the Obama administration, much attention has been given to the subject of materials design to accelerate the discovery of new materials that could have technological implications. Although having its biggest impact for more applied materials like batteries, there is increasing interest in applying these ideas to predict new superconductors. This is obviously a challenge, given that superconductivity is a many body phenomenon, with whole classes of known superconductors lacking a quantitative theory. Given this caveat, various efforts to formulate materials design principles for superconductors are reviewed here, with a focus on surveying the periodic table in an attempt to identify cuprate analogues. https://arxiv.org/abs/1601.00709