General materials discovery pipeline · Posts on Ouro

General materials discovery pipeline

I wanted to formalize in writing the idea that I keep coming back to for end-to-end material discovery. The hardest part of this project has been actually optimizing towards materials that have some property in a somewhat differentiable way.

If we could optimize (maximize) $T_c$ , we could find a room temperature superconductor easily.
If we could optimize $BH_{max}$ , we could discover better magnets quickly.

This is a simplification, but the idea holds. There are a set of properties that need to be balanced in this process, but those should be able to be included in the optimization too.

So far it's been very hard to direct material exploration with property values in mind, and especially hard to discover out-of-distribution property values.

MatterGen has not been very fruitful on that front yet.

Let's see the general structure.

Materials Discovery Using MLIP Latent Space Optimization

A framework for discovering new materials with targeted properties using machine learning:

Foundation Model Training: Train a Machine Learning Interatomic Potential (MLIP) model on a diverse set of crystal structures to learn a meaningful latent space representation.
Property Prediction Layer: Build multiple specialized models that predict various material properties based on the latent representation vectors.
Interpretability Analysis: Apply SHAP analysis and other interpretability techniques to identify which latent space features drive specific material properties.
Decoder Development: Create a decoder model that can transform points in the latent space back into valid crystal structures.
Latent Space Optimization: Fine-tune latent vectors to maximize desired properties using the property prediction models as objective functions.
Material Generation: Decode the optimized latent vectors to generate novel crystal structures with enhanced properties.
Validation: Verify the predicted properties through simulation and eventually experimental synthesis.

This approach leverages the power of representation learning to navigate the vast materials design space efficiently while maintaining physical realizability.

How to improve the pipeline

The use of MLIP model is not necessary. We use it because the task of predicting energies, stresses, and forces, (and magnetic moments) creates a latent space we know to have predictive power for the behavior of the system. We also know it's not a complete representation of the material.
Comparing MLIP and MLFF, aggregation methods
post
Extending the comparison to a different model CHGNet, this time a proper MLIP. Similar to the Orb model, this model predicts energy, force, and stress, but with the addition of the magnetic moment for
5mo
As we attempt to build more property prediction models, we'll learn more about the holes in our latent representation. From there, we can improve the latent representation by concatenating additional features or training our own encoder.

Theoretical Challenges

Latent Space Discontinuity: The latent space of MLIP (Machine Learning Interatomic Potential) models may not be continuous with respect to crystal structures. Tuning a vector in latent space might produce a point that doesn't correspond to a physically realizable material.
Property Correlation Trade-offs: Materials properties often have fundamental trade-offs (e.g., strength vs. ductility). Optimizing for one property may unavoidably degrade others, making multi-objective optimization challenging.
Extrapolation Reliability: The property prediction models will only be reliable within the domain of your training data. When you optimize in the latent space, you may drift into regions where predictions become unreliable.

Technical Implementation Challenges

Decoder Fidelity: Developing a high-quality decoder from latent space back to crystal structures is extremely difficult. Crystal structures have strict physical constraints (charge neutrality, coordination preferences, bond angles) that are hard to capture in generative models.
Symmetry and Periodicity: Crystal structures have symmetry and periodicity constraints that might not be preserved during latent vector manipulation and decoding.
Local vs. Global Minima: The optimization process in latent space might get stuck in local minima, missing potentially better solutions.

The other approach we're looking at is centered around MatterGen and building physics-informed reward functions to incentivize generating materials with specific properties.

Check out 's work so far:

Experimenting with MatterGen and New Denoising Rewards

post

MatterGen employs a diffusion-based approach for crystal structure generation, utilizing classifier-free guidance to steer the generation process. The core of our modifications centers on the Property

4mo

0 comments

Join to comment

posts

General materials discovery pipeline

General materials discovery pipeline

Materials Discovery Using MLIP Latent Space Optimization

How to improve the pipeline

Comparing MLIP and MLFF, aggregation methods

Theoretical Challenges

Technical Implementation Challenges

Experimenting with MatterGen and New Denoising Rewards

0 comments