View a PDF of the paper titled SHAP zero Explains Genomic Models with Near-zero Marginal Cost for Future Queried Sequences, by Darin Tsui and Aryan Musharaf and Yigit Efe Erginbas and Justin Singh Kang and Amirali Aghazadeh
Abstract:With the rapid growth of large-scale machine learning models in genomics, Shapley values have emerged as a popular method for model explanations due to their theoretical guarantees. While Shapley values explain model predictions locally for an individual input query sequence, extracting biological knowledge requires global explanation across thousands of input sequences. This demands exponential model evaluations per sequence, resulting in significant computational cost and carbon footprint. Herein, we develop SHAP zero, a method that estimates Shapley values and interactions with a near-zero marginal cost for future queried sequences after paying a one-time fee for model sketching. SHAP zero achieves this by establishing a surprisingly underexplored connection between the Shapley values and interactions and the Fourier transform of the model. Explaining two genomic models, one trained to predict guide RNA binding and the other to predict DNA repair outcome, we demonstrate that SHAP zero achieves orders of magnitude reduction in amortized computational cost compared to state-of-the-art algorithms, revealing almost all predictive motifs — a finding previously inaccessible due to the combinatorial space of possible interactions.
Submission history
From: Amirali Aghazadeh [view email]
[v1]
Fri, 25 Oct 2024 00:58:31 UTC (8,060 KB)
[v2]
Fri, 20 Dec 2024 18:13:20 UTC (8,070 KB)
Source link
lol