Findings from the first pass at tree searching
A couple weeks back I posted about a new approach we cooked up for permanent magnet searching (rare earth free of course)
Detailing our open experimentation with SakanaAI's Treequest algorithm, AB-MCTS, and its potential applicability in rare-earth free permanent magnet discovery.
If you were watching the data section closely, you would have seen a deluge of Heusler alloys, cubic 4 element systems, and desperate attempts at 5 element crystals. A few promising candidates emerged, and down sampled the best of them with the MAE predictor
After ran the pipeline, we are left with a handful of our best candidates to continue validating. The next filter they need to pass is a decent magnetocrystalline anisotropy energy. Check out Will's
Our FeCoNiPt and FeCoPt systems reign supreme but the cost prohibitive heavy Platinum concentration prevents any physical exploration.
As a whole, the approach still seems promising. We see glimpses of average score increases as the tree depth and iteration count increases. All of our best candidates were child structures of failed parents (that sounds dark), and o3 was consistently providing rigorous reasoning output for changes made to composition and space group.
As we messed around with new prompts, updated child context to contain the genealogy back to the root nodes, and implemented caching strategies to prevent the search from covering the same system twice, we started to notice that regardless of the space group that o3 requested, CrystalLLM would only return cubic, or otherwise highly symmetric systems: Pmmm, P4/mmm, Fm3m, P4mm, and the occasional P3m1 or P-6m2. While this is speculation at the moment, CrystalLLM likely sacrifices stability in favor of generating highly symmetric systems. This happens regardless of the requested space group.
These findings sparked a broader conversation about the necessity of a generative model (o3), using another generative model (CrystalLLM, MatterGen, Chemeleon, etc), to create structures. In a way it feels like we're getting in the way of o3's reasoning capabilities by forcing it to be limited by the capabilities of CrystalLLM. As the tree gets deeper we started to see almost 100% discrepancy between the o3 requested space group, and the one CrystalLLM provided. Global scoring would plateau around 0.8 (NdFeB scores a 0.95), with some runs resulting in o3 seemingly getting frustrated and introducing a known high performance magnet like MP registered FeN systems.
Unfortunately this means it's the end of the road for CrystalLLM, and the end of the first iteration of this Treequest based approach.
Iteration 2 will experiment with allowing o3 (or the driving LLM, whatever it might be), to generate the cifs directy. Early experimentation with Gemini 2.5 Pro shows promise... Stay tuned.