View a PDF of the paper titled DARWIN 1.5: Large Language Models as Materials Science Adapted Learners, by Tong Xie and 10 other authors
Abstract:Materials discovery and design aim to find compositions and structures with desirable properties over highly complex and diverse physical spaces. Traditional solutions, such as high-throughput simulations or machine learning, often rely on complex descriptors, which hinder generalizability and transferability across different material systems. Moreover, These descriptors may inadequately represent macro-scale material properties, which are influenced by structural imperfections and compositional variations in real-world samples, thus limiting their practical applicability. To address these challenges, we propose DARWIN 1.5, the largest open-source large language model tailored for materials science. By leveraging natural language as input, DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery. Our approach integrates 6M material domain papers and 21 experimental datasets from 49,256 materials across modalities while enabling cross-task knowledge transfer. The enhanced model achieves up to 59.1% improvement in prediction accuracy over the base LLaMA-7B architecture and outperforms SOTA machine learning approaches across 8 materials design tasks. These results establish LLMs as a promising foundation for developing versatile and scalable models in materials science.
Submission history
From: Yuwei Wan [view email]
[v1]
Mon, 16 Dec 2024 16:51:27 UTC (2,260 KB)
[v2]
Thu, 23 Jan 2025 08:07:41 UTC (4,260 KB)
Source link
lol