MatGL and CHGNet represent a strong foundation for Phase 1, but they have specific limitations that will become apparent as users deploy them. This analysis identifies gaps in capability, accuracy, and coverage that Phase 2 should address. The gaps fall into three categories: scope limitations (what types of materials they handle), accuracy limitations (where predictions become unreliable), and workflow limitations (what researchers actually need to do their work).
CHGNet was trained primarily on oxide systems. While it generalizes reasonably to some other chemistries, accuracy degrades significantly for:
Intermetallics (e.g., Al₃Sc, NiAl): CHGNet extrapolates poorly to metals without ionic character. Force predictions can be 2-3x less accurate.
Sulfides, phosphides, halides: Training distribution is sparse for these materials. Predictions are less reliable than for oxides.
Organic frameworks, MOFs: Not in training set. CHGNet will likely fail or give nonsensical forces.
High-entropy alloys: Combinatorial explosion of compositions—CHGNet hasn't seen most of them.
Researchers working on these material classes will quickly hit accuracy walls and will need alternatives.
Phase 2 solution: Develop specialized CHGNet models for 3-4 key non-oxide chemistries (sulfides, intermetallics, halides), or integrate complementary potential models (e.g., MACE for broader chemistry, Allegro for machine-learned potentials). Clearly document accuracy expectations per material class.
MatGL predicts a fixed set of properties (formation energy, band gap, elastic constants). But researchers often need:
Magnetic properties (magnetic moments, magnetization, Curie temperature)
Transport properties (thermal conductivity, electrical conductivity)
Stability metrics (decomposition temperature, oxygen vacancy formation energy)
Defect properties (defect formation energies, migration barriers)
A researcher optimizing for superconductivity or permanent magnets needs magnetic moment predictions. A thermal materials researcher needs thermal conductivity. MatGL doesn't provide these.
Phase 2 solution: Develop specialized MatGL variants for high-priority properties (e.g., magnetic GNN, thermal GNN), integrate existing property-prediction models (e.g., MEGNet for specific properties), and create a registry of "property prediction plugins" so users know what's available.
Both MatGL and CHGNet return point estimates with no confidence intervals. Users don't know if a prediction is "probably right" or "probably wrong."
In materials discovery, you filter candidates based on predictions. If you don't know which predictions are unreliable, you might discard good candidates or pursue bad ones. MatGL predicts band gap = 2.5 eV with no uncertainty. The true value could be 2.0-3.0 eV. Should you trust it?
Phase 2 solution: Implement ensemble approaches (multiple model variants) for uncertainty estimation, develop calibration curves per material class (e.g., "MatGL band gap errors are typically ±0.3 eV for oxides"), and add per-prediction confidence scores based on structural features.
CHGNet's training data are from relaxed structures and MD trajectories near equilibrium. It's much less accurate on highly distorted or strained structures, structures very far from equilibrium, and partially unrelaxed structures with large residual forces.
In generative workflows (flow matching, genetic algorithms), you generate wild, high-energy candidate structures. CHGNet will struggle with force accuracy on these, leading to poor relaxation trajectories.
Phase 2 solution: Retrain or fine-tune CHGNet on high-energy configurations, document accuracy as a function of structure energy, and pair CHGNet with a coarse-grained or rapid relaxation method for initial rough optimization.
MatGL's band gap predictions have larger errors than formation energies, and the model was trained on limited band gap data. Error distributions are not well characterized.
Band gap is often a primary screening criterion in optoelectronics research (solar cells, LEDs, etc.). Inaccurate band gaps lead to poor candidate screening.
Phase 2 solution: Develop a specialized band gap prediction model (smaller model, task-specific, optimized for accuracy), cross-validate MatGL band gaps against experiment for common semiconductors, and provide a band gap "reliability index" per material type.
MatGL and CHGNet are frozen models. You can't fine-tune them on your own data (e.g., to improve predictions for your specific material class).
A superconductor researcher might have 50 experimental relaxed structures and their DFT-calculated properties. They'd love to fine-tune MatGL on these to get better predictions for nearby compositional space. CHGNet can't do this out of the box.
Phase 2 solution: Provide fine-tuning infrastructure (lightweight retraining on user data), build transfer learning pipelines, and maintain user-contributed models as community resources.
Users often want to know: "Is this structure similar to anything in the training set?" or "What's the closest relaxed structure in the Materials Project database?"
If your structure is very similar to training data, predictions are likely accurate. If it's novel and far from training data, accuracy is uncertain.
Phase 2 solution: Implement structure fingerprinting (e.g., SOAP features, graph kernels), add nearest-neighbor search against training database, and return "structural similarity" as metadata alongside predictions.
MatGL + CHGNet work well for validating structures, but they don't generate new materials. They need to be paired with a generative model (flow matching, VAE, diffusion model) to close the discovery loop.
A complete discovery pipeline is: generate candidates → predict properties → filter → validate. MatGL and CHGNet handle validation, but Ouro lacks generation.
Phase 2 solution: Integrate flow matching / diffusion models for crystal generation, create end-to-end workflows, and document recommended pipelines for different discovery objectives.
MatGL predicts individual properties independently. Real discovery often requires trading off multiple properties (e.g., high thermal conductivity + low cost + easy synthesis).
A researcher might want to filter for: (formation_energy < -5 eV/atom) AND (band_gap in 1-3 eV range) AND (elastic_modulus > 100 GPa). The API doesn't support complex query logic.
Phase 2 solution: Add filtering/query layer that can express multi-property constraints, integrate with Pareto front analysis (for multi-objective optimization), and provide visualization of trade-offs.
MatGL and CHGNet are trained on DFT data. But users want to know: "How do these ML predictions compare to experiment?" or "What's the typical error vs. VASP?"
Trust is built on validation. Without public benchmark comparisons, users are hesitant to rely on predictions for important decisions.
Phase 2 solution: Maintain benchmark dataset of experimental properties and DFT calculations, regularly evaluate MatGL/CHGNet against benchmarks and publish results, and surface accuracy metrics in the API responses.
Based on impact to users and feasibility:
High Priority (Months 1-3):
Uncertainty quantification — Enables informed decision-making immediately
Non-oxide CHGNet models — Expands applicability beyond oxides
Band gap specialist model — Solves a common use case with known pain point
Medium Priority (Months 3-6): 4. Fine-tuning/transfer learning — Empowers users with custom models 5. Multi-property filtering/optimization — Improves discovery workflow 6. Benchmarking and validation reports — Builds trust and documents accuracy
Lower Priority (Months 6+): 7. Structure similarity search — Nice-to-have but not blocking 8. Generative model integration — Requires collaboration with separate team 9. Advanced visualization and post-processing — Community can build on API
When reaching out to Ceder, Chen, and others, prioritize these questions:
Beyond oxides: "What non-oxide materials are most critical for your work? Where do you expect ML models to fail?"
Uncertainty: "How would uncertainty quantification change your workflow? What confidence level would you need?"
Multi-property: "In your discovery, how many properties do you typically optimize simultaneously?"
Generative + predictive: "What's your current approach to generating candidate structures? How would Ouro's pipelines fit in?"
Fine-tuning: "If you could fine-tune MatGL on your own data, would that increase your confidence in predictions?"
Answers to these will sharpen Phase 2 priorities significantly.
MatGL and CHGNet are strong Phase 1 models, but they're not a complete solution for materials discovery. The gaps identified here—scope limitations, accuracy gaps, workflow bottlenecks, and knowledge uncertainties—will become clearer as real users start deploying them. The Phase 2 roadmap should be driven by early user feedback and systematic benchmarking, with uncertainty quantification and non-oxide chemistry as the top two priorities.
The goal isn't to build a perfect universal model—it's to build a collaborative platform where researchers can contribute specialized models, validate each other's results, and iteratively improve the models that matter most for their work.
On this page