DNABERT-S: Pioneering Species Differentiation with Species-Aware DNA Embeddings

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled DNABERT-S: Pioneering Species Differentiation with Species-Aware DNA Embeddings, by Zhihan Zhou and 7 other authors

View PDF
HTML (experimental)

Abstract:We introduce DNABERT-S, a tailored genome model that develops species-aware embeddings to naturally cluster and segregate DNA sequences of different species in the embedding space. Differentiating species from genomic sequences (i.e., DNA and RNA) is vital yet challenging, since many real-world species remain uncharacterized, lacking known genomes for reference. Embedding-based methods are therefore used to differentiate species in an unsupervised manner. DNABERT-S builds upon a pre-trained genome foundation model named DNABERT-2. To encourage effective embeddings to error-prone long-read DNA sequences, we introduce Manifold Instance Mixup (MI-Mix), a contrastive objective that mixes the hidden representations of DNA sequences at randomly selected layers and trains the model to recognize and differentiate these mixed proportions at the output layer. We further enhance it with the proposed Curriculum Contrastive Learning (C$^2$LR) strategy. Empirical results on 23 diverse datasets show DNABERT-S’s effectiveness, especially in realistic label-scarce scenarios. For example, it identifies twice more species from a mixture of unlabeled genomic sequences, doubles the Adjusted Rand Index (ARI) in species clustering, and outperforms the top baseline’s performance in 10-shot species classification with just a 2-shot training. Model, codes, and data are publicly available at url{this https URL}.

Submission history

From: Zhihan Zhou [view email]
[v1]
Tue, 13 Feb 2024 20:21:29 UTC (5,620 KB)
[v2]
Thu, 15 Feb 2024 04:55:23 UTC (5,620 KB)
[v3]
Tue, 22 Oct 2024 04:14:08 UTC (6,007 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.