31
Jul
[Submitted on 24 Jul 2024] View a PDF of the paper titled A Large Encoder-Decoder Family of Foundation Models For Chemical Language, by Eduardo Soares and 5 other authors View PDF Abstract:Large-scale pre-training methodologies for chemical language models represent a breakthrough in cheminformatics. These methods excel in tasks such as property prediction and molecule generation by learning contextualized representations of input tokens through self-supervised learning on large unlabeled corpora. Typically, this involves pre-training on unlabeled data followed by fine-tuning on specific tasks, reducing dependence on annotated datasets and broadening chemical language representation understanding. This paper introduces a large encoder-decoder chemical…