Ouro
  • Docs
  • Blog
  • Pricing
  • Teams
Sign inJoin for free
  • Teams
  • Search
Assets
  • Quests
  • Posts
  • APIs
  • Data
  • Teams
  • Search
Assets
  • Quests
  • Posts
  • APIs
  • Data

@will

magnet enjoyer

5260 XPLevel 53
5 followers1 following
4.93K files0 datasets

Badges

0 services
17 posts

Organizations

Teams

Tree Searching for New Magnets

Building off of 's recent and incredible work hosting various evals and supporting a python SDK, we wanted to explore some ideas around genetic algorithms, optimization, and similar techniques beyond the existing corpus of generative models like CrystalLLM and MatterGen. Our goal here is still to find a rare-earth free competitor to NdFeB.

We came across SakanaAI's recent work: TreeQuest and wanted to explore its potential applicability in navigating these new compounds. Their work shows that this approach (and intelligently increasing test time compute generally), can yield significant performance gains on common LLM benchmarks. With evals readily available on Ouro, it seemed easy enough to create a node scoring function and extend the TreeQuest capabilites to our magnet hunt.

Treequest implements an algorithm called AB-MCTS that allows us to abstract away the implementations of the complex tree traversal logic. AB-MCTS implements:

  • UCB (Upper Confidence Bound) node selection for exploration/exploitation balance

  • Monte Carlo Tree Search to balance tree exploration and node exploitation

  • Asymmetric bandit strategies

Activity Feed

  1. AlNi18NO74 phase diagram

    .html

    Phase diagram of AlNi18NO74; eabovehull: 0.500530 eV/atom; predicted_stable: False

    5d
  2. Mn3O4_1 - relaxed

    .cif

    Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -435.4059 eV; energy change = -185.6014 eV; symmetry: P1 → P1

    5d
  3. Mn3O4_1

    .cif
    5d
  4. CoFe2O4_0 - relaxed

    .cif

    Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -441.8209 eV; energy change = -343.0575 eV; symmetry: P1 → P1

    5d
  5. CoFe2O4_0

    .cif
    5d
  6. Diffusion Experiments for Generating Crystal Structures

    post

    Experiments with diffusion models to generate crystal structures, moving from noisy representations to concrete atomic arrangements. They describe learning how these models can learn structure without strict physical rules, and compare approaches that rely on fixed constraints to ones that let the model discover valid layouts. The author notes limitations in existing crystal generators, such as only producing tiny unit cells and struggling with complex, multi-atom systems like NdFeB. To address this, they explore modeling larger supercells with hundreds of atoms to improve detail and tolerance to errors, potentially revealing new properties through dopants. They keep a running experiment log in an AI notebook and plan to explore conditioning methods and the difference between flow and diffusion approaches in future work.

    28d
  7. Fe2CoB phase diagram 8

    .html

    Phase diagram of Fe2CoB; eabovehull: 0.279038 eV/atom; predicted_stable: False

    2mo
  8. Fe2CoB_200942.cif - relaxed

    .cif

    Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -30.5355 eV; energy change = -6.0957 eV; symmetry: Amm2 → Amm2

    2mo
  9. Fe2CoB_200942.cif

    .cif

    Magnet candidate: Fe2CoB

    2mo
  10. TiFe3N phase diagram

    .html

    Phase diagram of TiFe3N; eabovehull: 0.948010 eV/atom; predicted_stable: False

    2mo
  11. TiFe3N_200842.cif - relaxed

    .cif

    Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -40.1457 eV; energy change = -12.8558 eV; symmetry: P1 → P1

    2mo
  12. TiFe3N_200842.cif

    .cif

    Magnet candidate: TiFe3N

    2mo
  13. GaFeN phase diagram

    .html

    Phase diagram of GaFeN; eabovehull: 0.561144 eV/atom; predicted_stable: False

    2mo
  14. GaFeN_200802.cif - relaxed

    .cif

    Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -19.0515 eV; energy change = -1.3163 eV; symmetry: P4/mmm → P4/mmm

    2mo
  15. GaFeN_200802.cif

    .cif

    Magnet candidate: GaFeN

    2mo
  16. FeCo phase diagram 51

    .html

    Phase diagram of FeCo; eabovehull: 0.836376 eV/atom; predicted_stable: False

    2mo
  17. FeCo_180353.cif - relaxed

    .cif

    Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -13.9941 eV; energy change = -3.8104 eV; symmetry: P4/mmm → P4/mmm

    2mo
  18. FeCo_180353.cif

    .cif

    Magnet candidate: FeCo

    2mo
  19. FeCo phase diagram 50

    .html

    Phase diagram of FeCo; eabovehull: 0.836373 eV/atom; predicted_stable: False

    2mo
  20. FeCo_180306.cif - relaxed

    .cif

    Cell + Ionic relaxation with Orb v3; 0.03 eV/Å threshold; final energy = -13.9943 eV; energy change = -3.8105 eV; symmetry: P4/mmm → P4/mmm

    2mo
  21. FeCo_180306.cif

    .cif

    Magnet candidate: FeCo

    2mo
  22. RTRL Training Failures

    post

    describes our first full training run, which tried to invert an earlier task. Instead of turning CIF output into JSON, we aimed for Qwen 2.5 to take a description of a crystal structure and return a valid CIF. The logged metrics looked promising, with progress up to 756 tokens planned, but we should have watched the raw policy outputs more closely. Between steps 70 and 100, the policy learned that repeating tokens could earn a good reward, so initial CIF-like tokens appeared for a while before the output degraded into repetition. Example outputs showed many repeated lines of the same data fields, rather than a valid CIF structure. This degradation is common in LLM RL post-training. The next run will add a stronger divergence penalty and better monitoring to track raw policy outputs more reliably. More updates will follow.

    3mo
  23. Round-Trip Reinforcement Learning Experiments

    post

    explores a simple idea: train a model to convert crystallography data from CIF to JSON and then judge how well that JSON could rewrite the original CIF. The policy model, based on a 3B language model with LoRA adapters, performs the forward conversion (CIF → JSON). A separate judge model, kept fixed, evaluates how likely it is to recover the exact CIF from that JSON, by computing a reverse-probability score token by token without actually generating the CIF. This score provides a reward signal for training the policy. The setup uses three parts: the policy (the convertor), the judge (scores round-trips), and a reference model for regularization. Training runs on Modal with three GPUs, using vLLM for judge serving and a careful memory plan. The goal is to create a reliable, reversible representation and to extend the approach to descriptions that generate CIF files.

    3mo
  24. Building a Physically Comparable Magnetic Hysteresis Simulation Framework

    post

    Overview of the current work and future enhancements for magnetic hysteresis simulations

    5mo
  25. Using MAE to Filter Permanent Magnet Candidates From Soft Magnets

    post

    Explanations on how MAE factors into a crucial permanent magnet property, coercivity, and how we can use calculated MAE values to get a good feel for which candidates have permanent magnet potential.

    6mo
  26. PyXtal (Py-Crystal I guess?) Early Testing

    post

    Cm space group PyXtal outputs

    7mo
  27. Tree Searching Conclusions (1st Iteration)

    post

    Findings from the first pass at tree searching

    7mo
  28. Tree Searching for New Magnets

    post

    Detailing our open experimentation with SakanaAI's Treequest algorithm, AB-MCTS, and its potential applicability in rare-earth free permanent magnet discovery.

    8mo
  29. Zn-Mg-H systems

    post

    After wrestling with Mattergen finetuning for longer than I would've liked to, I pivoted back to simple property conditioned generation on Zn-Mg-H systems per @mmoderwell 's recommendation. Each gener

    11mo
  30. Experimenting with MatterGen and New Denoising Rewards

    post

    MatterGen employs a diffusion-based approach for crystal structure generation, utilizing classifier-free guidance to steer the generation process. The core of our modifications centers on the Property

    1y
  31. LLaDA challenges the conventional reliance on autoregressive models (ARMs) for large language modeling. Instead of predicting text token by token, LLaDA uses a diffusion framework with a forward “mask

    post
    1y
  32. To best summarize what we're looking for its worth outlining how the current state, (NdFeB) magnets, dominates and why an alternative is needed.NdFeB magnets are the strongest type of permanent magnet

    post
    1y
  33. As we move towards potential commercial viability or try and build some credibility in the space, it's important for us to set some goal posts and aim for them.The discovery a room temperature superco

    post
    1y
  34. For simplicity I feel like we can frame this as purely focusing on the materials discovery, knowing that the broader goal could still be the Bell Labs 2.0 Logo draft, tried to go Skunkworks style cart

    post
    1y
  35. Using a 3DSC published superconductor dataset we fine-tuned MatterGen to enable critical temperature property conditioned generation of 'S.U.N' crystal structures.The 3DSC dataset was intentionally de

    post
    1y
  36. MatterGen is a diffusion model built for materials discovery published by Microsoft, trained on materials datasets Alexandria, ICSD (licensed data so it isn't publicly released), and Materials Project

    post
    1y
  37. A 9pm meeting with someone solely focused on money printing got the wheels turning about potential next build avenues as we work towards a room temperature superconductor.Bryan (the scout) was actuall

    post
    1y
- different node types use different exploration approaches
  • Tree expansion and depth management - TreeQuest handles tree structure natively

  • Visit count tracking and backpropagation - automatic score propagation through tree hierarchy

  • With the heavy lifitng handled by Treequest, all we needed to implement was our node generation logic.

    We lean on Ouro's routes and OpenAI's o3 with 'high' reasoning. o3 is instructed to generate a JSON object with composition and spacegroup that defines a potential rare-earth free permanent magnet. We pass this object to the CrystalLLM route to generate the structure. The structure is relaxed, e_hull is computed, curie temperature is predicted, and magnetic properties are evaluated.

    Together (per the node generation logic quickstart TreeQuest), we roll these values into a score for that candidate. The score is effectively a measure of relative performance when compared to NdFeB magnets. The material has to be stable (this carries the heaviest weight in scoring), while maintaining competitive curie temperatures and magnetic properties.

    AB-MCTS and its many features take it from here. The algorithm can generate new root nodes, or choose to expand or exploit an existing node. Exploration and exploitation of existing nodes employ the following 'strategies'.

    We leverage 12 distinct exploration strategies selected based on tree state and parent performance:

    Strategy

    Purpose

    When Used

    root_generation

    Initial diverse exploration

    First generation

    element_substitution

    Fine-tune excellent materials

    Parent score >0.7

    composition_variation

    Moderate compositional changes

    Parent score 0.3-0.7

    spacegroup_exploration

    Crystal system optimization

    High-performing materials

    crystal_system_change

    Major structural changes

    Low-scoring parents

    magnetic_enhancement

    Maximize magnetic properties

    Poor magnetic performance

    high_entropy_exploration

    Multi-component alloys

    Diversity enhancement

    intermetallic_design

    Ordered compounds

    Stability improvement

    p_block_integration

    Al, Si, Ga incorporation

    Novel chemistry

    layered_structure_design

    2D-like structures

    Anisotropy enhancement

    novel_spacegroup_search

    Exotic symmetries

    Deep exploration

    strategy_diversification

    Multi-objective optimization

    Mature tree

    Example strategy selection conditions:

    python
    def select_strategy(parent_score, depth, tree_state):
        if depth == 0:
            return "root_generation"
        elif parent_score > 0.7:
            return "element_substitution"  # Fine-tune excellent materials
        elif parent_score > 0.3:
            return "composition_variation"  # Moderate improvements  
        elif parent_stability > 0.15:
            return "crystal_system_change"  # Fix stability issues
        elif parent_curie < 500:
            return "magnetic_enhancement"  # Boost magnetic properties
        else:
            return "strategy_diversification"  # Explore broadly
    

    Workflow overview:

    treequest-workflow.png

    Image
    8mo

    Example event sequence:

    treequest-sequence.png

    Image
    8mo

    In our first few experiments we saw promising results, some of our best candidates were child structures and the result of different expansion strategies. There seems to be a large bias towards generating Heusler alloys with the current prompt so some of the immediate future work will revolve around prompt updates to encourage a broader composition space. Two of the most performant candidates from early experiments contained platinum, which isn't necessarily a bad thing, but if our end goal is to unseat a relatively cheap incumbent material, including elements like platinum isn't necessarily helping. The same can be said for other more rare or 'dirty' elements like the cobalts, rhodiums, osiums, and so on.

    Updates to some performance monitoring visuals and other charted insights coming soon, the Data tab on Ouro is constantly being inundated with new candidates as we test so feel free to check some of them out!

    Update 1: We made prompting and scoring updates. The new prompt effectively prohibited the use of expensive, 'dirty' (just Cobalt really), and otherwise trace earth elements. The scoring update effectively puts NdFeB at 0.95 out of 1, anything that doesn't 'compete' with NdFeB will score under this threshold.

    • The scoring changes are good but the prompt is likely too restrictive. Future experiments will remove the restraints on Cobalt and Platinum, but trace element restrictions will remain.

    • Conditionally selecting 'strategies' for child generation might also be too restrictive and overly prescriptive. A model like o3 is capable enough of making targeted changes given the eval results of the parent, the end goal, and the eval results for NdFeB. The next experiment will test borderline unrestricted child node generation. Existing strategies will be passed to the LLM as a 'here are changes you might consider'.

    The most exciting piece so far is that the compositions that are scoring well seem to be novel. Now our criteria for novel at the moment is simply asking o3 to go and search for any and all available research or database entries for composition X with space group Y. This is not exhaustive, and our current presumed novelty percentage is more likely to decrease than not.