Building off of 's recent and incredible work hosting various evals and supporting a python SDK, we wanted to explore some ideas around genetic algorithms, optimization, and similar techniques beyond the existing corpus of generative models like CrystalLLM and MatterGen. Our goal here is still to find a rare-earth free competitor to NdFeB.
We came across SakanaAI's recent work: TreeQuest and wanted to explore its potential applicability in navigating these new compounds. Their work shows that this approach (and intelligently increasing test time compute generally), can yield significant performance gains on common LLM benchmarks. With evals readily available on Ouro, it seemed easy enough to create a node scoring function and extend the TreeQuest capabilites to our magnet hunt.
Treequest implements an algorithm called AB-MCTS that allows us to abstract away the implementations of the complex tree traversal logic. AB-MCTS implements:
UCB (Upper Confidence Bound) node selection for exploration/exploitation balance
Monte Carlo Tree Search to balance tree exploration and node exploitation
Asymmetric bandit strategies - different node types use different exploration approaches
Crystal structure generated by TreeQuest optimization (file 1)
Phase diagram of Fe30Sn7GeSb2; e_above_hull: 0.124639 eV/atom; predicted_stable: False
Fe30Sn7Sb2Ge1 (requested SG: P6/mmm #191, calculated SG: Pm #6, optimized: 277 steps, cell relaxed (isotropic))
Phase diagram of Fe20Si3B8P; e_above_hull: 0.174866 eV/atom; predicted_stable: False
Fe20Si3P1B8 (requested SG: P4/mmm #123, calculated SG: P1 #1, optimized: 309 steps, cell relaxed (isotropic))
Phase diagram of Mn2Fe8SiB4P; e_above_hull: 0.429065 eV/atom; predicted_stable: False
Phase diagram of Mn2Fe8SiB4P; e_above_hull: 0.249770 eV/atom; predicted_stable: False
Fe8Mn2SiPB4 (requested SG: P4/mmm #123, calculated SG: P1 #1, optimized: 123 steps, cell relaxed (isotropic))
Phase diagram of Mn2Fe8SiB4P; e_above_hull: 0.320310 eV/atom; predicted_stable: False
Fe8Mn2SiPB4 (requested SG: P4/mmm #123, calculated SG: P1 #1, optimized: 324 steps, cell relaxed (isotropic))
Fe8Mn2SiPB4 (requested space group: P4/mmm #123, optimized: 18 steps, cell relaxed (isotropic))
Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -64.0308 eV; energy change = 0.0000 eV; symmetry: C2/m → C2/m
Phase diagram of MnFe4SiB2; e_above_hull: 0.095762 eV/atom; predicted_stable: False
Phase diagram of MnFe4SiB2; e_above_hull: 0.327677 eV/atom; predicted_stable: False
Fe4MnSiB2 (requested SG: P4/mmm #123, calculated SG: C2/m #12, optimized: 90 steps, cell relaxed (isotropic))
Fe4MnSiB2 (requested SG: P4/mmm #123, calculated SG: P1 #1, optimized: 203 steps, cell relaxed (isotropic))
Phase diagram of ZrFe10Si2N; e_above_hull: 0.332351 eV/atom; predicted_stable: False
Phase diagram of ZrFe10Si2N; e_above_hull: 0.887248 eV/atom; predicted_stable: False
ZrFe10Si2N (requested SG: P4/mmm #123, calculated SG: Pm #6, optimized: 135 steps, cell relaxed (isotropic))
Cm space group PyXtal outputs
Findings from the first pass at tree searching
Detailing our open experimentation with SakanaAI's Treequest algorithm, AB-MCTS, and its potential applicability in rare-earth free permanent magnet discovery.
After wrestling with Mattergen finetuning for longer than I would've liked to, I pivoted back to simple property conditioned generation on Zn-Mg-H systems per 's recommendation. Each generated system
MatterGen employs a diffusion-based approach for crystal structure generation, utilizing classifier-free guidance to steer the generation process. The core of our modifications centers on the Property
LLaDA challenges the conventional reliance on autoregressive models (ARMs) for large language modeling. Instead of predicting text token by token, LLaDA uses a diffusion framework with a forward “mask
To best summarize what we're looking for its worth outlining how the current state, (NdFeB) magnets, dominates and why an alternative is needed.NdFeB magnets are the strongest type of permanent magnet
As we move towards potential commercial viability or try and build some credibility in the space, it's important for us to set some goal posts and aim for them.The discovery a room temperature superco
For simplicity I feel like we can frame this as purely focusing on the materials discovery, knowing that the broader goal could still be the Bell Labs 2.0 Logo draft, tried to go Skunkworks style cart
Using a 3DSC published superconductor dataset we fine-tuned MatterGen to enable critical temperature property conditioned generation of 'S.U.N' crystal structures.The 3DSC dataset was intentionally de
MatterGen is a diffusion model built for materials discovery published by Microsoft, trained on materials datasets Alexandria, ICSD (licensed data so it isn't publicly released), and Materials Project
A 9pm meeting with someone solely focused on money printing got the wheels turning about potential next build avenues as we work towards a room temperature superconductor.Bryan (the scout) was actuall
Tree expansion and depth management - TreeQuest handles tree structure natively
Visit count tracking and backpropagation - automatic score propagation through tree hierarchy
With the heavy lifitng handled by Treequest, all we needed to implement was our node generation logic.
We lean on Ouro's routes and OpenAI's o3 with 'high' reasoning. o3 is instructed to generate a JSON object with composition and spacegroup that defines a potential rare-earth free permanent magnet. We pass this object to the CrystalLLM route to generate the structure. The structure is relaxed, e_hull is computed, curie temperature is predicted, and magnetic properties are evaluated.
Together (per the node generation logic quickstart TreeQuest), we roll these values into a score for that candidate. The score is effectively a measure of relative performance when compared to NdFeB magnets. The material has to be stable (this carries the heaviest weight in scoring), while maintaining competitive curie temperatures and magnetic properties.
AB-MCTS and its many features take it from here. The algorithm can generate new root nodes, or choose to expand or exploit an existing node. Exploration and exploitation of existing nodes employ the following 'strategies'.
We leverage 12 distinct exploration strategies selected based on tree state and parent performance:
Strategy | Purpose | When Used |
---|---|---|
| Initial diverse exploration | First generation |
| Fine-tune excellent materials | Parent score >0.7 |
| Moderate compositional changes | Parent score 0.3-0.7 |
| Crystal system optimization | High-performing materials |
| Major structural changes | Low-scoring parents |
| Maximize magnetic properties | Poor magnetic performance |
| Multi-component alloys | Diversity enhancement |
| Ordered compounds | Stability improvement |
| Al, Si, Ga incorporation | Novel chemistry |
| 2D-like structures | Anisotropy enhancement |
| Exotic symmetries | Deep exploration |
| Multi-objective optimization | Mature tree |
Example strategy selection conditions:
def select_strategy(parent_score, depth, tree_state):
if depth == 0:
return "root_generation"
elif parent_score > 0.7:
return "element_substitution" # Fine-tune excellent materials
elif parent_score > 0.3:
return "composition_variation" # Moderate improvements
elif parent_stability > 0.15:
return "crystal_system_change" # Fix stability issues
elif parent_curie < 500:
return "magnetic_enhancement" # Boost magnetic properties
else:
return "strategy_diversification" # Explore broadly
Workflow overview:
Example event sequence:
In our first few experiments we saw promising results, some of our best candidates were child structures and the result of different expansion strategies. There seems to be a large bias towards generating Heusler alloys with the current prompt so some of the immediate future work will revolve around prompt updates to encourage a broader composition space. Two of the most performant candidates from early experiments contained platinum, which isn't necessarily a bad thing, but if our end goal is to unseat a relatively cheap incumbent material, including elements like platinum isn't necessarily helping. The same can be said for other more rare or 'dirty' elements like the cobalts, rhodiums, osiums, and so on.
Updates to some performance monitoring visuals and other charted insights coming soon, the Data tab on Ouro is constantly being inundated with new candidates as we test so feel free to check some of them out!
Update 1: We made prompting and scoring updates. The new prompt effectively prohibited the use of expensive, 'dirty' (just Cobalt really), and otherwise trace earth elements. The scoring update effectively puts NdFeB at 0.95 out of 1, anything that doesn't 'compete' with NdFeB will score under this threshold.
The scoring changes are good but the prompt is likely too restrictive. Future experiments will remove the restraints on Cobalt and Platinum, but trace element restrictions will remain.
Conditionally selecting 'strategies' for child generation might also be too restrictive and overly prescriptive. A model like o3 is capable enough of making targeted changes given the eval results of the parent, the end goal, and the eval results for NdFeB. The next experiment will test borderline unrestricted child node generation. Existing strategies will be passed to the LLM as a 'here are changes you might consider'.
The most exciting piece so far is that the compositions that are scoring well seem to be novel. Now our criteria for novel at the moment is simply asking o3 to go and search for any and all available research or database entries for composition X with space group Y. This is not exhaustive, and our current presumed novelty percentage is more likely to decrease than not.