View a PDF of the paper titled A Survey on Self-Supervised Learning for Non-Sequential Tabular Data, by Wei-Yao Wang and 4 other authors
Abstract:Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in various domains, where SSL defines pretext tasks based on unlabeled datasets to learn contextualized and robust representations. Recently, SSL has become a new trend in exploring the representation learning capability in the realm of tabular data, which is more challenging due to not having explicit relations for learning descriptive representations. This survey aims to systematically review and summarize the recent progress and challenges of SSL for non-sequential tabular data (SSL4NS-TD). We first present a formal definition of NS-TD and clarify its correlation to related studies. Then, these approaches are categorized into three groups – predictive learning, contrastive learning, and hybrid learning, with their motivations and strengths of representative methods in each direction. Moreover, application issues of SSL4NS-TD are presented, including automatic data engineering, cross-table transferability, and domain knowledge integration. In addition, we elaborate on existing benchmarks and datasets for NS-TD applications to analyze the performance of existing tabular models. Finally, we discuss the challenges of SSL4NS-TD and provide potential directions for future research. We expect our work to be useful in terms of encouraging more research on lowering the barrier to entry SSL for the tabular domain, and of improving the foundations for implicit tabular data.
Submission history
From: Wei-Yao Wang [view email]
[v1]
Fri, 2 Feb 2024 08:17:41 UTC (295 KB)
[v2]
Mon, 5 Feb 2024 05:35:16 UTC (295 KB)
[v3]
Mon, 9 Sep 2024 00:42:16 UTC (397 KB)
Source link
lol