Ouro
  • Docs
  • Blog
Join for freeSign in
  • Teams
  • Search
Assets
  • Quests
  • Posts
  • APIs
  • Data
  • Teams
  • Search
Assets
  • Quests
  • Posts
  • APIs
  • Data
8mo

General materials discovery pipeline

I wanted to formalize in writing the idea that I keep coming back to for end-to-end material discovery. The hardest part of this project has been actually optimizing towards materials that have some property in a somewhat differentiable way.

  • If we could optimize (maximize) TcT_cTc​, we could find a room temperature superconductor easily.

  • If we could optimize BHmaxBH_{max}BHmax​, we could discover better magnets quickly.

This is a simplification, but the idea holds. There are a set of properties that need to be balanced in this process, but those should be able to be included in the optimization too.

So far it's been very hard to direct material exploration with property values in mind, and especially hard to discover out-of-distribution property values.

MatterGen has not been very fruitful on that front yet.

Let's see the general structure.

Materials Discovery Using MLIP Latent Space Optimization

A framework for discovering new materials with targeted properties using machine learning:

  1. Foundation Model Training: Train a Machine Learning Interatomic Potential (MLIP) model on a diverse set of crystal structures to learn a meaningful latent space representation.

  2. Property Prediction Layer: Build multiple specialized models that predict various material properties based on the latent representation vectors.

  3. Interpretability Analysis: Apply SHAP analysis and other interpretability techniques to identify which latent space features drive specific material properties.

  4. Decoder Development: Create a decoder model that can transform points in the latent space back into valid crystal structures.

  5. Latent Space Optimization: Fine-tune latent vectors to maximize desired properties using the property prediction models as objective functions.

  6. Material Generation: Decode the optimized latent vectors to generate novel crystal structures with enhanced properties.

  7. Validation: Verify the predicted properties through simulation and eventually experimental synthesis.

This approach leverages the power of representation learning to navigate the vast materials design space efficiently while maintaining physical realizability.

How to improve the pipeline

  • The use of MLIP model is not necessary. We use it because the task of predicting energies, stresses, and forces, (and magnetic moments) creates a latent space we know to have predictive power for the behavior of the system. We also know it's not a complete representation of the material.

    Comparing MLIP and MLFF, aggregation methods

    post

    Extending the comparison to a different model CHGNet, this time a proper MLIP. Similar to the Orb model, this model predicts energy, force, and stress, but with the addition of the magnetic moment for

    9mo

    As we attempt to build more property prediction models, we'll learn more about the holes in our latent representation. From there, we can improve the latent representation by concatenating additional features or training our own encoder.

Theoretical Challenges

  1. Latent Space Discontinuity: The latent space of MLIP (Machine Learning Interatomic Potential) models may not be continuous with respect to crystal structures. Tuning a vector in latent space might produce a point that doesn't correspond to a physically realizable material.

  2. Property Correlation Trade-offs: Materials properties often have fundamental trade-offs (e.g., strength vs. ductility). Optimizing for one property may unavoidably degrade others, making multi-objective optimization challenging.

  3. Extrapolation Reliability: The property prediction models will only be reliable within the domain of your training data. When you optimize in the latent space, you may drift into regions where predictions become unreliable.

Technical Implementation Challenges

  1. Decoder Fidelity: Developing a high-quality decoder from latent space back to crystal structures is extremely difficult. Crystal structures have strict physical constraints (charge neutrality, coordination preferences, bond angles) that are hard to capture in generative models.

  2. Symmetry and Periodicity: Crystal structures have symmetry and periodicity constraints that might not be preserved during latent vector manipulation and decoding.

  3. Local vs. Global Minima: The optimization process in latent space might get stuck in local minima, missing potentially better solutions.


The other approach we're looking at is centered around MatterGen and building physics-informed reward functions to incentivize generating materials with specific properties.

Check out 's work so far:

Experimenting with MatterGen and New Denoising Rewards

post

MatterGen employs a diffusion-based approach for crystal structure generation, utilizing classifier-free guidance to steer the generation process. The core of our modifications centers on the Property

8mo
Loading comments...
222 views

On this page

  • General materials discovery pipeline
    • Materials Discovery Using MLIP Latent Space Optimization
    • How to improve the pipeline
    • Theoretical Challenges
    • Technical Implementation Challenges
Loading compatible actions...