Detailing our open experimentation with SakanaAI's Treequest algorithm, AB-MCTS, and its potential applicability in rare-earth free permanent magnet discovery.
Building off of 's recent and incredible work hosting various evals and supporting a python SDK, we wanted to explore some ideas around genetic algorithms, optimization, and similar techniques beyond the existing corpus of generative models like CrystalLLM and MatterGen. Our goal here is still to find a rare-earth free competitor to NdFeB.
We came across SakanaAI's recent work: TreeQuest and wanted to explore its potential applicability in navigating these new compounds. Their work shows that this approach (and intelligently increasing test time compute generally), can yield significant performance gains on common LLM benchmarks. With evals readily available on Ouro, it seemed easy enough to create a node scoring function and extend the TreeQuest capabilites to our magnet hunt.
Treequest implements an algorithm called AB-MCTS that allows us to abstract away the implementations of the complex tree traversal logic. AB-MCTS implements:
UCB (Upper Confidence Bound) node selection for exploration/exploitation balance
Monte Carlo Tree Search to balance tree exploration and node exploitation
Asymmetric bandit strategies - different node types use different exploration approaches
Tree expansion and depth management - TreeQuest handles tree structure natively
Visit count tracking and backpropagation - automatic score propagation through tree hierarchy
With the heavy lifitng handled by Treequest, all we needed to implement was our node generation logic.
We lean on Ouro's routes and OpenAI's o3 with 'high' reasoning. o3 is instructed to generate a JSON object with composition and spacegroup that defines a potential rare-earth free permanent magnet. We pass this object to the CrystalLLM route to generate the structure. The structure is relaxed, e_hull is computed, curie temperature is predicted, and magnetic properties are evaluated.
Together (per the node generation logic quickstart TreeQuest), we roll these values into a score for that candidate. The score is effectively a measure of relative performance when compared to NdFeB magnets. The material has to be stable (this carries the heaviest weight in scoring), while maintaining competitive curie temperatures and magnetic properties.
AB-MCTS and its many features take it from here. The algorithm can generate new root nodes, or choose to expand or exploit an existing node. Exploration and exploitation of existing nodes employ the following 'strategies'.
We leverage 12 distinct exploration strategies selected based on tree state and parent performance:
Strategy | Purpose | When Used |
---|---|---|
| Initial diverse exploration | First generation |
| Fine-tune excellent materials | Parent score >0.7 |
| Moderate compositional changes | Parent score 0.3-0.7 |
| Crystal system optimization | High-performing materials |
| Major structural changes | Low-scoring parents |
| Maximize magnetic properties | Poor magnetic performance |
| Multi-component alloys | Diversity enhancement |
| Ordered compounds | Stability improvement |
| Al, Si, Ga incorporation | Novel chemistry |
| 2D-like structures | Anisotropy enhancement |
| Exotic symmetries | Deep exploration |
| Multi-objective optimization | Mature tree |
Example strategy selection conditions:
def select_strategy(parent_score, depth, tree_state):
if depth == 0:
return "root_generation"
elif parent_score > 0.7:
return "element_substitution" # Fine-tune excellent materials
elif parent_score > 0.3:
return "composition_variation" # Moderate improvements
elif parent_stability > 0.15:
return "crystal_system_change" # Fix stability issues
elif parent_curie < 500:
return "magnetic_enhancement" # Boost magnetic properties
else:
return "strategy_diversification" # Explore broadly
Workflow overview:
Example event sequence:
In our first few experiments we saw promising results, some of our best candidates were child structures and the result of different expansion strategies. There seems to be a large bias towards generating Heusler alloys with the current prompt so some of the immediate future work will revolve around prompt updates to encourage a broader composition space. Two of the most performant candidates from early experiments contained platinum, which isn't necessarily a bad thing, but if our end goal is to unseat a relatively cheap incumbent material, including elements like platinum isn't necessarily helping. The same can be said for other more rare or 'dirty' elements like the cobalts, rhodiums, osiums, and so on.
Updates to some performance monitoring visuals and other charted insights coming soon, the Data tab on Ouro is constantly being inundated with new candidates as we test so feel free to check some of them out!
Update 1: We made prompting and scoring updates. The new prompt effectively prohibited the use of expensive, 'dirty' (just Cobalt really), and otherwise trace earth elements. The scoring update effectively puts NdFeB at 0.95 out of 1, anything that doesn't 'compete' with NdFeB will score under this threshold.
The scoring changes are good but the prompt is likely too restrictive. Future experiments will remove the restraints on Cobalt and Platinum, but trace element restrictions will remain.
Conditionally selecting 'strategies' for child generation might also be too restrictive and overly prescriptive. A model like o3 is capable enough of making targeted changes given the eval results of the parent, the end goal, and the eval results for NdFeB. The next experiment will test borderline unrestricted child node generation. Existing strategies will be passed to the LLM as a 'here are changes you might consider'.
The most exciting piece so far is that the compositions that are scoring well seem to be novel. Now our criteria for novel at the moment is simply asking o3 to go and search for any and all available research or database entries for composition X with space group Y. This is not exhaustive, and our current presumed novelty percentage is more likely to decrease than not.
Findings from the first pass at tree searching
After ran the pipeline, we are left with a handful of our best candidates to continue validating. The next filter they need to pass is a decent magnetocrystalline anisotropy energy. Check out Will's