SuperCon dataset and classification model performance
a) Histogram of materials categorized by Tc (bin size is 2 K, only those with finite Tc are counted). Blue, green, and red denote low-Tc, iron-based, and cuprate superconductors, respectively. In the inset: histogram of materials categorized by ln(Tc) restricted to those with Tc > 10 K.
b) Performance of different classification models as a function of the threshold temperature (Tsep) that separates materials in two classes by Tc. Performance is measured by accuracy (gray), precision (red), recall (blue), and F1 score (purple). The scores are calculated from predictions on an independent test set, i.e., one separate from the dataset used to train the model. In the inset: the dashed red curve gives the proportion of materials in the above-Tsep set.
c) Accuracy, precision, recall, and F1 score as a function of the size of the training set with a fixed test set.
d) Accuracy, precision, recall, and F1 as a function of the number of predictors
Image file