This study employs the SuperCon dataset as the largest superconducting materials dataset. Then, we perform various data pre-processing steps to derive the clean DataG dataset, containing 13,022 compounds. In another stage of the study, we apply the novel CatBoost algorithm to predict the transition temperatures of novel superconducting materials. In addition, we developed a package called Jabir, which generates 322 atomic descriptors. We also designed an innovative hybrid method called the Soraya package to select the most critical features from the feature space. These yield R2 and RMSE values (0.952 and 6.45 K, respectively) superior to those previously reported in the literature. Finally, as a novel contribution to the field, a web application was designed for predicting and determining the Tc values of superconducting materials.
How this file is connected to other assets
Literature review of databases with materials and . See literature review on ML models which utilize these datasets:
So far this is the most recent paper I've found on ML prediction of , improving on both modeling (CatBoost) and dataset compared to Stanev et al.
Literature review of existing studies done on predicting with machine learning.
Discover other files like this one
Superconductivity has been the focus of enormous research effort since its discovery more than a century ago. Yet, some features of this unique phenomenon remain poorly understood; prime among these is the connection between superconductivity and chemical/structural properties of materials. To bridge the gap, several machine learning schemes are developed herein to model the critical temperatures (Tc) of the 12,000+ known superconductors available via the SuperCon database. Materials are first divided into two classes based on their Tc values, above and below 10 K, and a classification model predicting this label is trained. The model uses coarse-grained features based only on the chemical compositions. It shows strong predictive power, with out-of-sample accuracy of about 92%. https://www.nature.com/articles/s41524-018-0085-8
Data-driven methods, in particular machine learning, can help to speed up the discovery of new materials by finding hidden patterns in existing data and using them to identify promising candidate materials. In the case of superconductors, the use of data science tools is to date slowed down by a lack of accessible data. In this work, we present a new and publicly available superconductivity dataset (‘3DSC’), featuring the critical temperature Tc of superconducting materials additionally to tested non-superconductors.
a) Histogram of materials categorized by Tc (bin size is 2 K, only those with finite Tc are counted). Blue, green, and red denote low-Tc, iron-based, and cuprate superconductors, respectively. In the inset: histogram of materials categorized by ln(Tc) restricted to those with Tc > 10 K. b) Performance of different classification models as a function of the threshold temperature (Tsep) that separates materials in two classes by Tc. Performance is measured by accuracy (gray), precision (red), recall (blue), and F1 score (purple). The scores are calculated from predictions on an independent test set, i.e., one separate from the dataset used to train the model. In the inset: the dashed red curve gives the proportion of materials in the above-Tsep set. c) Accuracy, precision, recall, and F1 score as a function of the size of the training set with a fixed test set. d) Accuracy, precision, recall, and F1 as a function of the number of predictors
While underestimating, Ba2Ca2Cu3Tl2O10-MP-mp-1228620 has a pretty decent phase change which relatively high certainties on both sizes of Tc.