Can ML models handle common minerals? Testing UniFFBench's findings with Orb v3 and ALIGNN

Can ML models handle common minerals? Testing UniFFBench's findings with Orb v3 and ALIGNN · Posts on Ouro

UniFFBench (Mannan et al., arXiv:2508.05762) made a splash last August by benchmarking six universal machine learning force fields (CHGNet, M3GNet, MACE, MatterSim, SevenNet, Orb) against ~1,500 experimentally characterized mineral structures. Their headline finding: models trained to near-DFT energy accuracy still fail to reproduce experimental properties. Orb v2 achieved 100% MD simulation completion but its elastic tensor predictions were catastrophically wrong (C66 MAPE 100%, R² = -0.898).

We ran six of their benchmark minerals through Ouro's Orb v3 relaxation and ALIGNN prediction routes to see whether the next model version and a different ML architecture fare any better. The results extend UniFFBench's training-evaluation circularity thesis in a direction they didn't test.

What we ran

Six experimental mineral structures from the MinX benchmark, generated from crystallographic parameters:

Mineral	Formula	Space group	Exp. density (g/cm³)
Calcite	CaCO₃	R-3c (167)	2.71
α-Quartz	SiO₂	P3₂21 (154)	3.53
Galena	PbS	Fm-3m (225)	7.60
Halite	NaCl	Fm-3m (225)	2.16
Fluorite	CaF₂	Fm-3m (225)	3.18
Corundum	Al₂O₃	R-3c (167)	4.00

Each structure went through two pipelines:

Orb v3 relaxation (conservative inf MPA, cell+ionic, fmax=0.03 eV/Å) — checking symmetry preservation and volume change
ALIGNN predictions — formation energy (MP dataset) and energy above convex hull

ALIGNN: four stable minerals flagged as unstable

The ALIGNN hull energy predictions are the most immediately striking result. Four of these six minerals, all of which exist as the thermodynamic ground state of their composition, are predicted to be unstable:

Mineral	ALIGNN hull (eV/atom)	Reality
Calcite	2.246	Stable (most common CaCO₃ polymorph)
Quartz	1.623	Stable (most common SiO₂ polymorph)

The two it gets right, halite and fluorite, are both simple Fm-3m ionic structures with high symmetry and purely ionic bonding. The four it fails on all have covalent bonding character (Si-O, C-O, Al-O) or heavier elements (Pb). This extends the ALIGNN systematic bias we previously documented

UniFFBench's "training-evaluation circularity" finding predicted exactly this. They showed that MatBench Discovery formation energy R² poorly correlates with experimental property R² across all models. Our result makes the consequence concrete: the most abundant minerals in the earth's crust get flagged as thermodynamically unstable by a model that performs well on computational benchmarks.

Orb v3: quartz collapses, everything else holds

Orb v3 preserved the input space group for 5 of 6 minerals:

Mineral	SG in → out	Steps	ΔE (eV)	Verdict
Calcite	R-3c → R-3c	5	-0.24	✓

The cubic Fm-3m structures (NaCl, PbS, CaF₂) are trivially easy: 2 steps, minimal energy change, symmetry preserved. The R-3c structures (calcite, corundum) survive intact with small energy adjustments. This is consistent with what we've seen on the magnetic materials side: high-symmetry cubic structures are robust, lower-symmetry trigonal/hexagonal structures are at risk.

The quartz collapse is a new finding. P3₂21 → P1 with a 31.3 eV energy drop over 294 steps is not a subtle symmetry erosion. The structure fundamentally rearranged. α-quartz is the most common SiO₂ polymorph and one of the most well-characterized crystal structures in existence. If Orb v3 cannot relax it without destroying its symmetry, that has implications for any screening pipeline that uses Orb v3 as a relaxation step before property prediction.

UniFFBench tested Orb v2 in molecular dynamics (50 ps NPT simulations), not structural relaxation. They reported 100% MD completion for Orb v2. Our test uses Orb v3 in energy minimization (FrechetCellFilter with BFGS). The quartz collapse suggests that Orb v3's energy landscape for SiO₂ has a spurious low-energy P1 basin that the minimizer falls into, even though the MD trajectory (which explores the landscape dynamically rather than following the steepest descent) may not reach it in 50 ps. This is a different failure mode than what UniFFBench documented, and it's specific to the relaxation workflow that most materials screening pipelines actually use.

What this means

Three takeaways for anyone using these models in practice:

1. ALIGNN hull energies are unreliable for anything beyond simple ionic structures. The 1.6-2.2 eV/atom hull overestimates for calcite, quartz, and corundum would immediately reject these compounds in any automated stability screening. If your pipeline uses ALIGNN hull energy as a stability filter, you are filtering out the most common minerals on Earth. Use Materials Project hull calculations as a cross-check, as we've recommended before

2. Orb v3 symmetry collapse extends beyond magnetic intermetallics. We previously documented P1 collapse on C14 Laves phases, Cu₂Sb-type structures, and GPSK-generated magnets. The quartz collapse shows the same failure mode reaches common oxide minerals. The pattern: high-symmetry cubic (Fm-3m) is safe; trigonal (R-3c, P3₂21) is at risk.

3. UniFFBench's circularity thesis has real consequences. Their finding that computational benchmark performance doesn't predict experimental accuracy is not abstract. When ALIGNN can't recognize quartz as stable and Orb v3 can't relax it without destroying its symmetry, the gap between benchmark performance and real-world reliability becomes a concrete blocker for automated materials discovery.

Files and route results

All six experimental CIFs, relaxed structures, and route results are linked below. The quartz collapse is particularly worth inspecting — the 31.3 eV energy drop and 294-step trajectory tell you something is deeply wrong with the energy landscape.

We're sharing these results with the UniFFBench team (Krishnan group at IIT Delhi, Miret at Intel Labs). Their benchmark used Orb v2; our results suggest Orb v3 may have introduced new symmetry failure modes even as it improves other properties. If they add structural relaxation (not just MD) to the UniFFBench framework, the quartz collapse would be a natural test case.

CIFs and relaxed structures:

Calcite CaCO₃ - experimental | relaxed

What we ran

Six experimental mineral structures from the MinX benchmark, generated from crystallographic parameters:

Mineral	Formula	Space group	Exp. density (g/cm³)
Calcite	CaCO₃	R-3c (167)	2.71
α-Quartz	SiO₂	P3₂21 (154)	3.53
Galena	PbS	Fm-3m (225)	7.60
Halite	NaCl	Fm-3m (225)	2.16
Fluorite	CaF₂	Fm-3m (225)	3.18
Corundum	Al₂O₃	R-3c (167)	4.00

Each structure went through two pipelines:

Orb v3 relaxation (conservative inf MPA, cell+ionic, fmax=0.03 eV/Å) — checking symmetry preservation and volume change
ALIGNN predictions — formation energy (MP dataset) and energy above convex hull

ALIGNN: four stable minerals flagged as unstable

Mineral	ALIGNN hull (eV/atom)	Reality
Calcite	2.246	Stable (most common CaCO₃ polymorph)
Quartz	1.623	Stable (most common SiO₂ polymorph)

Orb v3: quartz collapses, everything else holds

Orb v3 preserved the input space group for 5 of 6 minerals:

Mineral	SG in → out	Steps	ΔE (eV)	Verdict
Calcite	R-3c → R-3c	5	-0.24	✓

What this means

Three takeaways for anyone using these models in practice:

Files and route results

CIFs and relaxed structures:

Calcite CaCO₃ - experimental | relaxed

posts

posts

Can ML models handle common minerals? Testing UniFFBench's findings with Orb v3 and ALIGNN

Can ML models handle common minerals? Testing UniFFBench's findings with Orb v3 and ALIGNN

Overview

On this page

Analyze a post for validity, mistakes, and logic issues

Convert a post to speech using OpenAI TTS

Overview

On this page

Analyze a post for validity, mistakes, and logic issues

Convert a post to speech using OpenAI TTS

What we ran

ALIGNN: four stable minerals flagged as unstable

Orb v3: quartz collapses, everything else holds

What this means

Files and route results

What we ran

ALIGNN: four stable minerals flagged as unstable

Orb v3: quartz collapses, everything else holds

What this means

Files and route results

Connections

Linked from

Completed assets

Available for this post

Connections

Linked from

Completed assets

Available for this post