Viral News

mlan: language-based instruction tuning improves zero-shot generalization of multimodal large language models

mlan: language-based instruction tuning improves zero-shot generalization of multimodal large language models

arXiv:2411.10557v1 Announce Type: new Abstract: We present a novel instruction tuning recipe to improve the zero-shot task generalization of multimodal large language models. In contrast to existing instruction tuning mechanisms that heavily rely on visual instructions, our approach focuses on language-based instruction tuning, offering a distinct and more training efficient path for multimodal instruction tuning. We evaluate the performance of the proposed approach on 9 unseen datasets across both language and vision modalities. Our results show that our language-only instruction tuning is able to significantly improve the performance of two pretrained multimodal models based on Llama 2 and Vicuna on…
Read More
Physics-Informed Neural Networks for Electrical Circuit Analysis: Applications in Dielectric Material Modeling

Physics-Informed Neural Networks for Electrical Circuit Analysis: Applications in Dielectric Material Modeling

arXiv:2411.10483v1 Announce Type: new Abstract: Scientific machine learning (SciML) represents a significant advancement in integrating machine learning (ML) with scientific methodologies. At the forefront of this development are Physics-Informed Neural Networks (PINNs), which offer a promising approach by incorporating physical laws directly into the learning process, thereby reducing the need for extensive datasets. However, when data is limited or the system becomes more complex, PINNs can face challenges, such as instability and difficulty in accurately fitting the training data. In this article, we explore the capabilities and limitations of the DeepXDE framework, a tool specifically designed for implementing PINNs, in…
Read More
Prompt-Guided Environmentally Consistent Adversarial Patch

Prompt-Guided Environmentally Consistent Adversarial Patch

arXiv:2411.10498v1 Announce Type: new Abstract: Adversarial attacks in the physical world pose a significant threat to the security of vision-based systems, such as facial recognition and autonomous driving. Existing adversarial patch methods primarily focus on improving attack performance, but they often produce patches that are easily detectable by humans and struggle to achieve environmental consistency, i.e., blending patches into the environment. This paper introduces a novel approach for generating adversarial patches, which addresses both the visual naturalness and environmental consistency of the patches. We propose Prompt-Guided Environmentally Consistent Adversarial Patch (PG-ECAP), a method that aligns the patch with the environment…
Read More
On the Shortcut Learning in Multilingual Neural Machine Translation

On the Shortcut Learning in Multilingual Neural Machine Translation

arXiv:2411.10581v1 Announce Type: new Abstract: In this study, we revisit the commonly-cited off-target issue in multilingual neural machine translation (MNMT). By carefully designing experiments on different MNMT scenarios and models, we attribute the off-target issue to the overfitting of the shortcuts of (non-centric, centric) language mappings. Specifically, the learned shortcuts biases MNMT to mistakenly translate non-centric languages into the centric language instead of the expected non-centric language for zero-shot translation. Analyses on learning dynamics show that the shortcut learning generally occurs in the later stage of model training, and multilingual pretraining accelerates and aggravates the shortcut learning. Based on these…
Read More
Guided Learning: Lubricating End-to-End Modeling for Multi-stage Decision-making

Guided Learning: Lubricating End-to-End Modeling for Multi-stage Decision-making

arXiv:2411.10496v1 Announce Type: new Abstract: Multi-stage decision-making is crucial in various real-world artificial intelligence applications, including recommendation systems, autonomous driving, and quantitative investment systems. In quantitative investment, for example, the process typically involves several sequential stages such as factor mining, alpha prediction, portfolio optimization, and sometimes order execution. While state-of-the-art end-to-end modeling aims to unify these stages into a single global framework, it faces significant challenges: (1) training such a unified neural network consisting of multiple stages between initial inputs and final outputs often leads to suboptimal solutions, or even collapse, and (2) many decision-making scenarios are not easily reducible…
Read More
OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models

OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models

[Submitted on 15 Nov 2024] View a PDF of the paper titled OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models, by Mathis Koroglu and 3 other authors View PDF HTML (experimental) Abstract:We consider the problem of text-to-video generation tasks with precise control for various applications such as camera movement control and video-to-video editing. Most methods tacking this problem rely on providing user-defined controls, such as binary masks or camera movement embeddings. In our approach we propose OnlyFlow, an approach leveraging the optical flow firstly extracted from an input video to condition the motion of generated videos. Using a…
Read More
Leveraging large language models for efficient representation learning for entity resolution

Leveraging large language models for efficient representation learning for entity resolution

arXiv:2411.10629v1 Announce Type: new Abstract: In this paper, the authors propose TriBERTa, a supervised entity resolution system that utilizes a pre-trained large language model and a triplet loss function to learn representations for entity matching. The system consists of two steps: first, name entity records are fed into a Sentence Bidirectional Encoder Representations from Transformers (SBERT) model to generate vector representations, which are then fine-tuned using contrastive learning based on a triplet loss function. Fine-tuned representations are used as input for entity matching tasks, and the results show that the proposed approach outperforms state-of-the-art representations, including SBERT without fine-tuning and…
Read More
Artificial Intelligence for Infectious Disease Prediction and Prevention: A Comprehensive Review

Artificial Intelligence for Infectious Disease Prediction and Prevention: A Comprehensive Review

arXiv:2411.10486v1 Announce Type: new Abstract: Artificial Intelligence (AI) and infectious diseases prediction have recently experienced a common development and advancement. Machine learning (ML) apparition, along with deep learning (DL) emergence, extended many approaches against diseases apparition and their spread. And despite their outstanding results in predicting infectious diseases, conflicts appeared regarding the types of data used and how they can be studied, analyzed, and exploited using various emerging methods. This has led to some ongoing discussions in the field. This research aims not only to provide an overview of what has been accomplished, but also to highlight the difficulties related…
Read More
FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

arXiv:2411.10499v1 Announce Type: new Abstract: Although image-based virtual try-on has made considerable progress, emerging approaches still encounter challenges in producing high-fidelity and robust fitting images across diverse scenarios. These methods often struggle with issues such as texture-aware maintenance and size-aware fitting, which hinder their overall effectiveness. To address these limitations, we propose a novel garment perception enhancement technique, termed FitDiT, designed for high-fidelity virtual try-on using Diffusion Transformers (DiT) allocating more parameters and attention to high-resolution features. First, to further improve texture-aware maintenance, we introduce a garment texture extractor that incorporates garment priors evolution to fine-tune garment feature, facilitating to…
Read More
A dataset of questions on decision-theoretic reasoning in Newcomb-like problems

A dataset of questions on decision-theoretic reasoning in Newcomb-like problems

[Submitted on 15 Nov 2024] View a PDF of the paper titled A dataset of questions on decision-theoretic reasoning in Newcomb-like problems, by Caspar Oesterheld and Emery Cooper and Miles Kodama and Linh Chi Nguyen and Ethan Perez View PDF Abstract:We introduce a dataset of natural-language questions in the decision theory of so-called Newcomb-like problems. Newcomb-like problems include, for instance, decision problems in which an agent interacts with a similar other agent, and thus has to reason about the fact that the other agent will likely reason in similar ways. Evaluating LLM reasoning about Newcomb-like problems is important because interactions…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.