Open research towards the discovery of room-temperature superconductors.
Discover other posts like this one
Good read. Well written, very detailed and thorough. Great contribution.
Collaborated with Claude on these notes, but it still contains the essence of my thoughts on the paper.
The authors recognized a critical limitation in existing materials databases - they mostly contain structures near equilibrium, making them poor at predicting behavior under extreme conditions. Their solution was a two-part data generation strategy:
Ground-state explorer: Focuses on equilibrium structures
Off-equilibrium explorer: Samples configurations under high temperatures (up to 5000K) and pressures (up to 1000 GPa)
They used active learning with ensemble uncertainty to guide the data collection, resulting in ~17M structures. This is far more comprehensive than previous databases, especially for extreme conditions.
When compared to leading models like CHGNet and MACE-MP-0, MatterSim shows dramatic improvements:
Up to 10x better accuracy on structures under extreme conditions
Significantly better performance on the MPF-TP and Random-TP benchmark sets
Superior ability to handle complex molecular dynamics simulations
The Graphformer version of MatterSim is particularly interesting for direct property prediction.
From S15.2:
After the message passing in M3GNet or structure encoder in Graphormer, we obtain a global representation of a structure using scatter operation to aggregate the node feature... With two different reduction methods, mean or summation, we obtain the readout vectors of a given material, which will be subsequently sent to a multi-layer perceptron (MLP) to make direct property predictions
On the MatBench tasks, it achieved state-of-the-art performance across multiple properties:
This is a very recent release (May 2024), and the open source implementation is still incomplete. While the base M3GNet version is available, several key components are missing:
The Graphformer version of the model isn't available yet
Fine-tuning scripts haven't been released
The full active learning pipeline isn't published
The data generation tools aren't included
This means that while the research results are impressive, practitioners currently can't access the full capabilities described in the paper. The available M3GNet version, while still powerful, doesn't match the performance of the Graphformer version on end-to-end property prediction tasks.
The missing Graphformer implementation is particularly significant because it's the version that achieved the best results on many tasks, especially direct property prediction. For instance, in predicting phonon properties, the Graphformer version achieved an MAE of 26.02 cm⁻¹ compared to M3GNet's 56.04 cm⁻¹.
The fine-tuning capabilities are also crucial, as they enable adaptation to new tasks with minimal data. For example, the paper demonstrates adapting to liquid water simulation with only 3% of the data needed for training from scratch, but these tools aren't yet available to the community.
There's also a clear incentive to use Microsoft's Azure environment for more advanced functionality. From the repo:
More advanced and fully-supported pretrained versions of MatterSim, and additional materials capabilities are available in Azure Quantum Elements.
As impressive as this contribution is, I don't think it's worth pursuing further until the repo matures. I'll be evaluating ORB as an alternative.