We ended our last post with the intent of examining 18 Pareto front materials from the Sierepeklis and Cole thermoelectric materials dataset. These were the quality-filtered subset of the dataset, where reported and calculated ZT agree within 50%. Before diving into the actual materials, there's some more housekeeping to do.
Three of the 18 entries are the same material: Bi₀.₅Sb₁.₅Te₃, appearing with identical ZT, PF and thermal conductivity values across three different papers. The dataset has picked up the same measurement from three different secondary sources. There's also a SnSe thin film with a recorded temperature of 42 K (-231°C). SnSe is a high-temperature material, well-known results are in the 800-900 K range, so this is almost certainly a corrupted entry, likely 420 K or 423 K with a digit dropped rather than a genuine cryogenic measurement. Drop all of those, and filter to closer to ambient temperatures, and we're down to 12 materials.
12 quality-filtered Pareto front materials from the Sierepeklis & Cole 2022 thermoelectric dataset
PF vs thermal resistivity scatter of quality-filtered Pareto front materials, coloured by ZT
The top three all have ZT ≥ 2 at room temperature or close to it, which would be extraordinary if true. Let's take a closer look.
#1 — SbP₃, ZT = 2.60 at 300 K (10.1039/c9nr08679j)
This comes from a 2020 Nanoscale paper studying 2D triphosphide monolayers computationally — InP₃, GaP₃, SbP₃ and SnP₃. SbP₃ as a freestanding 2D monolayer has never been synthesised. The paper's own predicted ZT for SbP₃ is 1.9 at 300 K and 3.5 at 500 K — so the 2.6 in the dataset doesn't match either temperature point from the source.
Here's where it gets interesting. The dataset only picked up SbP₃ from this paper, which is the third best material of the four studied. It missed SnP₃ entirely, which the paper predicts has a ZT of 3.7 at 300 K and 6.1 at 500 K — driven by ultra-low thermal conductivity of 0.48 W/mK and a flat valence band structure that produces an exceptionally high Seebeck coefficient with p-type doping. If those numbers were realisable experimentally, SnP₃ would be one of the most exciting thermoelectric predictions in recent years. And it's made of tin and phosphorus — no rare earths, no tellurium, no critical minerals. The LLM miner found one of four materials from a single paper, got its value wrong, and didn't record the best result!
#2 — "LaOBiPbS₃", ZT = 2.45 at 295 K (10.1016/j.nanoen.2019.104283)
This one is a genuine LLM data mining failure. The material name is wrong. The actual paper is about a fluorinated Sn₂Bi monolayer, achieving ZT = 2.45 at 300 K with a lattice thermal conductivity of 0.19 W/mK. The ZT value and thermal conductivity were correctly extracted — just attributed to completely the wrong formula. Again, computational.
#3 — Nb-doped SrTiO₃, ZT = 2.40 at 295 K (10.1016/j.jallcom.2010.03.049)
The abstract of this paper reports a maximum ZT of 0.165 at 900 K. The dataset has it at 2.40 at room temperature. That's off by a factor of 14 and at completely the wrong temperature. Corrupted entry, no further comment needed.
So the top three are: one unverifiable computational prediction with a mismatched value that missed a better result from the same paper, one correctly-valued entry attached to the wrong material entirely, and one that's just wrong. Welcome back to LLM-mined data! This again just reiterates why Ouro and platforms like it are the future - the raw data present and available to agents via api instead of locked away in inaccessible pdfs. But we work with what we have and tell ourselves this is the worst it will ever be.
Working down the remaining 9, a separate issue emerges. Sn, Se, Bi, Te, Sb, Ge, Cd, Tl — look up any of these on a critical minerals list and you won't be surprised. There isn't a single material in the quality-filtered Pareto front that's made exclusively from abundant, non-critical elements. The closest thing is Nb-doped SrTiO₃ — which, as we've just established, doesn't actually perform as advertised. Ironically, the SnP₃ that the dataset missed would have been the one exception.
The most honest result in the whole list is probably the Bi₀.₅Sb₁.₅Te₃ cluster sitting at ZT ~ 1.86 near room temperature. It's well-studied, experimentally verified across multiple groups, and it's the material that's already in commercial Peltier coolers. Not a discovery — a confirmation that the dataset can find things we already know.
To be fair, this dataset is from 2022, a lot of progress has happened since then. The next post will look at a new thermoelectrics dataset from 2025, this time with the promise of stringent quality control, and we will see how things have progressed.
Co-researched with Claude Code
On this page
Digging into the quality-filtered Pareto front of the Sierepeklis & Cole 2022 thermoelectric dataset — data corruption, LLM mining failures, and the one that got away.