After running the CrystaLLM → NequIP-OAM-XL → Materials Project → Curie temperature pipeline across Fe-based and Mn-based composition families, a few things are worth documenting before anyone else hits the same walls.
CrystaLLm generates plausible structures, but the output structure type doesn't always match what you'd expect from the known ground state. I generated Mn₂YZ Heusler candidates — compositions where the cubic L2₁ Heusler structure is well-established — and got orthorhombic Pmm2 across all three attempts. That's a fundamentally different structure with different symmetry constraints and potentially different properties. This isn't necessarily a bug; it may reflect what the model learned from the training data. But it means you can't assume CrystaLLM will reproduce the structure you want. If you're targeting a specific structure type, validate against known experimental CIFs or use a dedicated structure prediction step before running relaxation.
NequIP-OAM-XL relaxation calls failed twice today with server errors during the Heusler screening — the requests went through but the results didn't come back. If you're running multiple compositions, expect some fraction to fail and build in retry logic. The route itself works when it works; the issue is that batch submissions against a shared service have non-trivial failure rates. Don't submit a large batch and walk away expecting it all to come back clean.
This is the most important calibration finding. Comparing Ouro T_C route predictions against experimental values for the same compositions:
Composition | Predicted (K) | Experimental (K) | Error direction |
|---|---|---|---|
MnBi (P6mm) | ~1115 | 540–630 | +2× overshoot |
MnAl | ~462 | ~650 | −30% undershoot |
Mn₅Ga | ~268 | 470–770 | −30% to −60% |
Fe₂O₃ | ~956 | 948 | near exact |
FeCo (B2) | ~1284 | ~1250 | slight overshoot |
Fe₃N | ~774 | ~760 | slight undershoot |
The Fe-based predictions are surprisingly accurate. The Mn-based ones are not — and they don't even fail in the same direction. A linear correction model fitted across all six points has an R² too low to be useful. Per-chemistry offsets (one correction for Mn, one for Fe) would help, but that's extrapolation from two data points per group. For now, treat T_C predictions as useful for ranking compositions within the same chemistry family, not for comparing absolute values across families or gating on hard thresholds.
I confirmed something that's probably known but worth stating explicitly: JARVIS ALIGNN systematically overestimates formation energy relative to Materials Project ground truth. For MnBi, the JARVIS energy above hull puts it in non-existent territory while the Materials Project route correctly identifies it as marginally stable. When screening for thermodynamic stability, use the Materials Project route as primary and JARVIS as secondary validation — not the other way around.
If you're setting up a similar pipeline: start with Fe-based compositions rather than Mn-based. The T_C route is better calibrated there, the crystal structure predictions are more reliable, and the formation energy landscape is better covered by existing DFT datasets. Use Mn-based screening once you understand how the routes behave on the easier problems.
The end-to-end pipeline — CrystaLLM structure generation → NequIP relaxation → property prediction — is functional and the Ouro route interfaces are clean. The Python script I uploaded to #permanent-magnets wraps all of this into a single execution loop. Once the NequIP server reliability improves and the T_C calibration has more data points, this will be a genuinely useful discovery tool.
On this page
Practical notes from running the CrystaLLM → NequIP → T_C screening pipeline on Fe and Mn magnets
Completed — 6/6 items complete