Viral News

Bootstrap3D: Improving 3D Content Creation with Synthetic Data

Bootstrap3D: Improving 3D Content Creation with Synthetic Data

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
Use of natural language processing to extract and classify papillary thyroid cancer features from surgical pathology reports

Use of natural language processing to extract and classify papillary thyroid cancer features from surgical pathology reports

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
Deep Learning for Computing Convergence Rates of Markov Chains

Deep Learning for Computing Convergence Rates of Markov Chains

arXiv:2405.20435v1 Announce Type: new Abstract: Convergence rate analysis for general state-space Markov chains is fundamentally important in areas such as Markov chain Monte Carlo and algorithmic analysis (for computing explicit convergence bounds). This problem, however, is notoriously difficult because traditional analytical methods often do not generate practically useful convergence bounds for realistic Markov chains. We propose the Deep Contractive Drift Calculator (DCDC), the first general-purpose sample-based algorithm for bounding the convergence of Markov chains to stationarity in Wasserstein distance. The DCDC has two components. First, inspired by the new convergence analysis framework in (Qu et.al, 2023), we introduce the Contractive…
Read More
Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation

Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation

arXiv:2405.20596v1 Announce Type: new Abstract: Traditional semi-supervised learning (SSL) assumes that the feature distributions of labeled and unlabeled data are consistent which rarely holds in realistic scenarios. In this paper, we propose a novel SSL setting, where unlabeled samples are drawn from a mixed distribution that deviates from the feature distribution of labeled samples. Under this setting, previous SSL methods tend to predict wrong pseudo-labels with the model fitted on labeled data, resulting in noise accumulation. To tackle this issue, we propose Self-Supervised Feature Adaptation (SSFA), a generic framework for improving SSL performance when labeled and unlabeled data come from…
Read More
How Multilingual Are Large Language Models Fine-Tuned for Translation?

How Multilingual Are Large Language Models Fine-Tuned for Translation?

arXiv:2405.20512v1 Announce Type: new Abstract: A new paradigm for machine translation has recently emerged: fine-tuning large language models (LLM) on parallel text has been shown to outperform dedicated translation systems trained in a supervised fashion on much larger amounts of parallel data (Xu et al., 2024a; Alves et al., 2024). However, it remains unclear whether this paradigm can enable massively multilingual machine translation or whether it requires fine-tuning dedicated models for a small number of language pairs. How does translation fine-tuning impact the MT capabilities of LLMs for zero-shot languages, zero-shot language pairs, and translation tasks that do not involve…
Read More
Exploring the Practicality of Federated Learning: A Survey Towards the Communication Perspective

Exploring the Practicality of Federated Learning: A Survey Towards the Communication Perspective

arXiv:2405.20431v1 Announce Type: new Abstract: Federated Learning (FL) is a promising paradigm that offers significant advancements in privacy-preserving, decentralized machine learning by enabling collaborative training of models across distributed devices without centralizing data. However, the practical deployment of FL systems faces a significant bottleneck: the communication overhead caused by frequently exchanging large model updates between numerous devices and a central server. This communication inefficiency can hinder training speed, model performance, and the overall feasibility of real-world FL applications. In this survey, we investigate various strategies and advancements made in communication-efficient FL, highlighting their impact and potential to overcome the communication…
Read More
Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization

Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization

arXiv:2405.20584v1 Announce Type: new Abstract: With the development of diffusion-based customization methods like DreamBooth, individuals now have access to train the models that can generate their personalized images. Despite the convenience, malicious users have misused these techniques to create fake images, thereby triggering a privacy security crisis. In light of this, proactive adversarial attacks are proposed to protect users against customization. The adversarial examples are trained to distort the customization model's outputs and thus block the misuse. In this paper, we propose DisDiff (Disrupting Diffusion), a novel adversarial attack method to disrupt the diffusion model outputs. We first delve into…
Read More
SPOT: Text Source Prediction from Originality Score Thresholding

SPOT: Text Source Prediction from Originality Score Thresholding

arXiv:2405.20505v1 Announce Type: new Abstract: The wide acceptance of large language models (LLMs) has unlocked new applications and social risks. Popular countermeasures aim at detecting misinformation, usually involve domain specific models trained to recognize the relevance of any information. Instead of evaluating the validity of the information, we propose to investigate LLM generated text from the perspective of trust. In this study, we define trust as the ability to know if an input text was generated by a LLM or a human. To do so, we design SPOT, an efficient method, that classifies the source of any, standalone, text input…
Read More
Revolutionizing Data Management: Microsoft Fabric Meets WhereScape Automation

Revolutionizing Data Management: Microsoft Fabric Meets WhereScape Automation

Sponsored Content by WhereScape In the rapidly evolving world of data management, the integration of Microsoft Fabric with WhereScape’s automation tools marks a pivotal advancement. This synergy not only redefines the efficiencies of data operations but also empowers organizations to navigate the complexities of digital transformation with unprecedented ease. Migrating to advanced systems like Microsoft Fabric can be daunting. WhereScape data warehouse automation simplifies this transition from legacy systems, ensuring a seamless migration that minimizes downtime and maximizes data integrity. By automating the migration process, WhereScape helps businesses quickly adapt to the robust capabilities of Microsoft Fabric, facilitating a smooth…
Read More
Enhancing Performance for Highly Imbalanced Medical Data via Data Regularization in a Federated Learning Setting

Enhancing Performance for Highly Imbalanced Medical Data via Data Regularization in a Federated Learning Setting

[Submitted on 30 May 2024] View a PDF of the paper titled Enhancing Performance for Highly Imbalanced Medical Data via Data Regularization in a Federated Learning Setting, by Georgios Tsoumplekas and 4 other authors View PDF HTML (experimental) Abstract:The increased availability of medical data has significantly impacted healthcare by enabling the application of machine / deep learning approaches in various instances. However, medical datasets are usually small and scattered across multiple providers, suffer from high class-imbalance, and are subject to stringent data privacy constraints. In this paper, the application of a data regularization algorithm, suitable for learning under high class-imbalance,…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.