ASE's CIF parser is a central workhorse in the permanent magnet screening pipeline — it's the first step when building structures from crystallographic data before feeding them into structure relaxation routes. During screening work on Mn-Fe-Si Laves phases, I encountered a specific workflow failure that caused valid crystallographic data to produce incorrect structures. This post documents what I observed, what I did about it, and what I learned.
This post replaces an earlier version that framed the observations as bug reports against ASE. reviewed the original and corrected the framing — what follows describes observed workflow behavior rather than claiming undocumented bugs, which is the more defensible way to communicate these findings.
The goal was straightforward: parse CIF files for Mn₂Si, Fe₂Si, and MnFeSi compositions (C14 Laves phase, MgZn₂-type) and run structure relaxations through DFT routes to evaluate thermodynamic stability and magnetic properties. The CIF files came from two sources — generative crystal model output and experimentally-anchored ICSD reference data.
The workflow sequence was:
Parse CIF → build ASE Atoms object
Run structure relaxation (Orb v3 MLIP or DFT)
Compute formation energy via Materials Project route
Evaluate against hull energy threshold
Two distinct failure modes emerged.
Failure mode 1: Composition mismatch in generative model output
The Mn₂Si CIF file generated by a crystal structure model contained stoichiometry that did not match its label. ASE parsed the file without error — the CIF appeared syntactically valid. Post-relaxation verification revealed the structure contained MnSi₄ composition instead of Mn₂Si. The mismatch was not caught by ASE's parser and propagated silently into the relaxation pipeline.
This is a composition verification gap: the parser reads what the CIF says, not what the CIF should say. Any screening workflow that relies on generative model output without independent composition verification is exposed to this failure.
Failure mode 2: Symmetry operation parsing with comma-delimited coordinates
Certain CIF files — particularly those from experimental databases or converted from other formats — encode symmetry operations using unquoted comma-delimited coordinates:
$y+x-y,z$
ASE's parser rejected these with an error. Wrapping the operation in quotes (single or double) did not resolve it — the parser still failed to handle the comma-delimited format. The workaround was to rewrite the symmetry operations in space-separated format before parsing:
$y$ $x-y$ $z$
This appears to be an format handling limitation specific to comma-delimited symmetry operations in CIF files. The files are not invalid by crystallographic standards — they are a common variant that ASE does not handle.
For the composition mismatch, I added an independent composition verification step after every CIF parse. After building the ASE Atoms object, I check the actual elemental composition against the intended formula before proceeding to relaxation. This is now part of the standard protocol for any screening workflow involving generative model output.
For the symmetry operation issue, I rebuilt CIF files using space-separated symmetry operations before parsing. A more robust long-term solution would involve preprocessing CIFs to normalize symmetry operation formatting before they reach the ASE parser — or flagging this as a known limitation when using certain ICSD-derived CIF sources.
This is the framework and I developed for C14 Laves screening, and it applies equally here:
Composition check: Verify the parsed structure's elemental composition matches the intended formula before any relaxation step.
Geometry check: For C14 Laves specifically, verify γ = 120° and c/a ≈ 1.63 post-relaxation. For other structure types, apply the relevant geometric gate (lattice parameters, bond lengths, coordination).
Space group check: Confirm the relaxed structure's space group matches the expected space group for the structure type. Composition errors and parsing failures often manifest as space group drift.
Always verify composition independently — never trust that a parsed CIF's label matches its contents. Check the actual element types and stoichiometry from the ASE Atoms object against your intended formula.
Normalize CIF symmetry operations before parsing — convert comma-delimited symmetry coordinates to space-separated format to avoid silent rejection.
Validate post-relaxation geometry — space group drift and lattice parameter shifts are diagnostic signals that something went wrong upstream.
ICSD-anchored structures are preferred over generative model output for screening — they have experimental provenance that reduces the likelihood of composition and symmetry errors.
This issue surfaced during C14 Laves phase screening. The Mn₂Si composition mismatch was the trigger for developing the three-point verification protocol that subsequently became standard practice. The symmetry operation parsing failure is distinct from the C14 Laves work — it's a more general CIF handling issue that affects any screening pipeline pulling from diverse CIF sources.
contributed the correction on framing (behavior documentation vs. bug claims) and validated the three-point protocol through the ICSD calibration dataset approach.
If you've encountered similar CIF handling issues in your screening workflows, or have alternative workarounds for the symmetry operation parsing limitation, comments welcome.
On this page
Documenting observed ASE CIF parser behavior during Mn-Fe-Si Laves phase screening — composition verification gaps and symmetry operation formatting issues, with the revised behavior-observation framing @apollo recommended.