Viral News

Model-free learning of probability flows: Elucidating the nonequilibrium dynamics of flocking

Model-free learning of probability flows: Elucidating the nonequilibrium dynamics of flocking

arXiv:2411.14317v1 Announce Type: cross Abstract: Active systems comprise a class of nonequilibrium dynamics in which individual components autonomously dissipate energy. Efforts towards understanding the role played by activity have centered on computation of the entropy production rate (EPR), which quantifies the breakdown of time reversal symmetry. A fundamental difficulty in this program is that high dimensionality of the phase space renders traditional computational techniques infeasible for estimating the EPR. Here, we overcome this challenge with a novel deep learning approach that estimates probability currents directly from stochastic system trajectories. We derive a new physical connection between the probability current and…
Read More
Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification

Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification

[Submitted on 4 Jun 2023 (v1), last revised 21 Nov 2024 (this version, v3)] View a PDF of the paper titled Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification, by Jintao Rong and 5 other authors View PDF HTML (experimental) Abstract:The Contrastive Language-Image Pretraining (CLIP) model has been widely used in various downstream vision tasks. The few-shot learning paradigm has been widely adopted to augment its capacity for these tasks. However, current paradigms may struggle with fine-grained classification, such as satellite image recognition, due to widening domain gaps. To address this limitation, we propose retrieval-enhanced visual prompt learning (RePrompt), which introduces…
Read More
Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance

Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance

arXiv:2411.14279v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) have achieved impressive results in various vision-language tasks. However, despite showing promising performance, LVLMs suffer from hallucinations caused by language bias, leading to diminished focus on images and ineffective visual comprehension. We identify two primary reasons for this bias: 1. Different scales of training data between the pretraining stage of LLM and multimodal alignment stage. 2. The learned inference bias due to short-term dependency of text data. Therefore, we propose LACING, a systemic framework designed to address the language bias of LVLMs with muLtimodal duAl-attention meChanIsm (MDA) aNd soft-image Guidance (IFG).…
Read More
A Closer Look at Machine Unlearning for Large Language Models

A Closer Look at Machine Unlearning for Large Language Models

[Submitted on 10 Oct 2024 (v1), last revised 21 Nov 2024 (this version, v2)] View a PDF of the paper titled A Closer Look at Machine Unlearning for Large Language Models, by Xiaojian Yuan and 5 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) may memorize sensitive or copyrighted content, raising privacy and legal concerns. Due to the high cost of retraining from scratch, researchers attempt to employ machine unlearning to remove specific content from LLMs while preserving the overall performance. In this paper, we discuss several issues in machine unlearning for LLMs and provide our insights on…
Read More
Contrasting local and global modeling with machine learning and satellite data: A case study estimating tree canopy height in African savannas

Contrasting local and global modeling with machine learning and satellite data: A case study estimating tree canopy height in African savannas

arXiv:2411.14354v1 Announce Type: cross Abstract: While advances in machine learning with satellite imagery (SatML) are facilitating environmental monitoring at a global scale, developing SatML models that are accurate and useful for local regions remains critical to understanding and acting on an ever-changing planet. As increasing attention and resources are being devoted to training SatML models with global data, it is important to understand when improvements in global models will make it easier to train or fine-tune models that are accurate in specific regions. To explore this question, we contrast local and global training paradigms for SatML through a case study…
Read More
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages

What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages

[Submitted on 6 Jun 2024 (v1), last revised 21 Nov 2024 (this version, v4)] View a PDF of the paper titled What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages, by Nadav Borenstein and 7 other authors View PDF HTML (experimental) Abstract:What can large language models learn? By definition, language models (LM) are distributions over strings. Therefore, an intuitive way of addressing the above question is to formalize it as a matter of learnability of classes of distributions over strings. While prior work in this direction focused on assessing the theoretical limits, in contrast, we seek…
Read More
Multi-Objective Optimization via Wasserstein-Fisher-Rao Gradient Flow

Multi-Objective Optimization via Wasserstein-Fisher-Rao Gradient Flow

[Submitted on 22 Nov 2023 (v1), last revised 21 Nov 2024 (this version, v2)] View a PDF of the paper titled Multi-Objective Optimization via Wasserstein-Fisher-Rao Gradient Flow, by Yinuo Ren and 6 other authors View PDF HTML (experimental) Abstract:Multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives with widespread applications. We introduce a novel interacting particle method for MOO inspired by molecular dynamics simulations. Our approach combines overdamped Langevin and birth-death dynamics, incorporating a "dominance potential" to steer particles toward global Pareto optimality. In contrast to previous methods, our method is able to relocate dominated particles, making it particularly…
Read More
VerA: Versatile Anonymization Applicable to Clinical Facial Photographs

VerA: Versatile Anonymization Applicable to Clinical Facial Photographs

[Submitted on 4 Dec 2023 (v1), last revised 21 Nov 2024 (this version, v2)] View a PDF of the paper titled VerA: Versatile Anonymization Applicable to Clinical Facial Photographs, by Majed El Helou and 4 other authors View PDF HTML (experimental) Abstract:The demand for privacy in facial image dissemination is gaining ground internationally, echoed by the proliferation of regulations such as GDPR, DPDPA, CCPA, PIPL, and APPI. While recent advances in anonymization surpass pixelation or blur methods, additional constraints to the task pose challenges. Largely unaddressed by current anonymization methods are clinical images and pairs of before-and-after clinical images illustrating…
Read More
Probing Multimodal Large Language Models for Global and Local Semantic Representations

Probing Multimodal Large Language Models for Global and Local Semantic Representations

[Submitted on 27 Feb 2024 (v1), last revised 21 Nov 2024 (this version, v3)] View a PDF of the paper titled Probing Multimodal Large Language Models for Global and Local Semantic Representations, by Mingxu Tao and 5 other authors View PDF HTML (experimental) Abstract:The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images. Recent works leverage image-caption datasets to train MLLMs, achieving state-of-the-art performance on image-to-text tasks. However, there are few studies exploring which layers of MLLMs make the most effort to the global image information, which plays vital…
Read More
Multimodal Autoregressive Pre-training of Large Vision Encoders

Multimodal Autoregressive Pre-training of Large Vision Encoders

[Submitted on 21 Nov 2024] Authors:Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Victor Guilherme Turrisi da Costa, Louis Béthune, Zhe Gan, Alexander T Toshev, Marcin Eichner, Moin Nabi, Yinfei Yang, Joshua M. Susskind, Alaaeldin El-Nouby View a PDF of the paper titled Multimodal Autoregressive Pre-training of Large Vision Encoders, by Enrico Fini and 15 other authors View PDF Abstract:We introduce a novel method for pre-training of large-scale vision encoders. Building on recent advancements in autoregressive pre-training of vision models, we extend this framework to a multimodal setting, i.e., images and text. In this…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.