Viral News

AI-generated Image Detection: Passive or Watermark?

AI-generated Image Detection: Passive or Watermark?

arXiv:2411.13553v1 Announce Type: cross Abstract: While text-to-image models offer numerous benefits, they also pose significant societal risks. Detecting AI-generated images is crucial for mitigating these risks. Detection methods can be broadly categorized into passive and watermark-based approaches: passive detectors rely on artifacts present in AI-generated images, whereas watermark-based detectors proactively embed watermarks into such images. A key question is which type of detector performs better in terms of effectiveness, robustness, and efficiency. However, the current literature lacks a comprehensive understanding of this issue. In this work, we aim to bridge that gap by developing ImageDetectBench, the first comprehensive benchmark to…
Read More
Improving Visual Place Recognition Based Robot Navigation By Verifying Localization Estimates

Improving Visual Place Recognition Based Robot Navigation By Verifying Localization Estimates

[Submitted on 11 Jul 2024 (v1), last revised 19 Nov 2024 (this version, v2)] View a PDF of the paper titled Improving Visual Place Recognition Based Robot Navigation By Verifying Localization Estimates, by Owen Claxton and 6 other authors View PDF HTML (experimental) Abstract:Visual Place Recognition (VPR) systems often have imperfect performance, affecting the `integrity' of position estimates and subsequent robot navigation decisions. Previously, SVM classifiers have been used to monitor VPR integrity. This research introduces a novel Multi-Layer Perceptron (MLP) integrity monitor which demonstrates improved performance and generalizability, removing per-environment training and reducing manual tuning requirements. We test our…
Read More
Patience Is The Key to Large Language Model Reasoning

Patience Is The Key to Large Language Model Reasoning

arXiv:2411.13082v1 Announce Type: new Abstract: Recent advancements in the field of large language models, particularly through the Chain of Thought (CoT) approach, have demonstrated significant improvements in solving complex problems. However, existing models either tend to sacrifice detailed reasoning for brevity due to user preferences, or require extensive and expensive training data to learn complicated reasoning ability, limiting their potential in solving complex tasks. To bridge this gap, following the concept of scaling test-time, we propose a simple method by encouraging models to adopt a more patient reasoning style without the need of introducing new knowledge or skills. To employ…
Read More
LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog Circuits

LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog Circuits

[Submitted on 19 Nov 2024] View a PDF of the paper titled LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog Circuits, by Dimple Vijay Kochar and 3 other authors View PDF HTML (experimental) Abstract:Traditional approaches for designing analog circuits are time-consuming and require significant human expertise. Existing automation efforts using methods like Bayesian Optimization (BO) and Reinforcement Learning (RL) are sub-optimal and costly to generalize across different topologies and technology nodes. In our work, we introduce a novel approach, LEDRO, utilizing Large Language Models (LLMs) in conjunction with optimization techniques to iteratively refine the design space for analog circuit…
Read More
TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction

TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction

arXiv:2411.11941v1 Announce Type: new Abstract: Dynamic scene reconstruction is a long-term challenge in 3D vision. Recent methods extend 3D Gaussian Splatting to dynamic scenes via additional deformation fields and apply explicit constraints like motion flow to guide the deformation. However, they learn motion changes from individual timestamps independently, making it challenging to reconstruct complex scenes, particularly when dealing with violent movement, extreme-shaped geometries, or reflective surfaces. To address the above issue, we design a plug-and-play module called TimeFormer to enable existing deformable 3D Gaussians reconstruction methods with the ability to implicitly model motion patterns from a learning perspective. Specifically, TimeFormer…
Read More
Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning

Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning

arXiv:2411.13045v1 Announce Type: cross Abstract: Effective query-item relevance modeling is pivotal for enhancing user experience and safeguarding user satisfaction in e-commerce search systems. Recently, benefiting from the vast inherent knowledge, Large Language Model (LLM) approach demonstrates strong performance and long-tail generalization ability compared with previous neural-based specialized relevance learning methods. Though promising, current LLM-based methods encounter the following inadequacies in practice: First, the massive parameters and computational demands make it difficult to be deployed online. Second, distilling LLM models to online models is a feasible direction, but the LLM relevance modeling is a black box, and its rich intrinsic knowledge…
Read More
SoK: A Systems Perspective on Compound AI Threats and Countermeasures

SoK: A Systems Perspective on Compound AI Threats and Countermeasures

arXiv:2411.13459v1 Announce Type: cross Abstract: Large language models (LLMs) used across enterprises often use proprietary models and operate on sensitive inputs and data. The wide range of attack vectors identified in prior research - targeting various software and hardware components used in training and inference - makes it extremely challenging to enforce confidentiality and integrity policies. As we advance towards constructing compound AI inference pipelines that integrate multiple large language models (LLMs), the attack surfaces expand significantly. Attackers now focus on the AI algorithms as well as the software and hardware components associated with these systems. While current research often…
Read More
Computer-Vision-Enabled Worker Video Analysis for Motion Amount Quantification

Computer-Vision-Enabled Worker Video Analysis for Motion Amount Quantification

[Submitted on 22 May 2024 (v1), last revised 19 Nov 2024 (this version, v2)] View a PDF of the paper titled Computer-Vision-Enabled Worker Video Analysis for Motion Amount Quantification, by Hari Iyer and 3 other authors View PDF HTML (experimental) Abstract:The performance of physical workers is significantly influenced by the extent of their motions. However, monitoring and assessing these motions remains a challenge. Recent advancements have enabled in-situ video analysis for real-time observation of worker behaviors. This paper introduces a novel framework for tracking and quantifying upper and lower limb motions, issuing alerts when critical thresholds are reached. Using joint…
Read More
MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers

MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers

arXiv:2411.12992v1 Announce Type: new Abstract: In order to reduce the computational complexity of large language models, great efforts have been made to to improve the efficiency of transformer models such as linear attention and flash-attention. However, the model size and corresponding computational complexity are constantly scaled up in pursuit of higher performance. In this work, we present MemoryFormer, a novel transformer architecture which significantly reduces the computational complexity (FLOPs) from a new perspective. We eliminate nearly all the computations of the transformer model except for the necessary computation required by the multi-head attention operation. This is made possible by utilizing…
Read More
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

[Submitted on 6 Nov 2024 (v1), last revised 19 Nov 2024 (this version, v2)] View a PDF of the paper titled Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?, by Daniel P. Jeong and 3 other authors View PDF HTML (experimental) Abstract:Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora. These works typically claim that such domain-adaptive pretraining (DAPT) improves performance on downstream medical tasks, such as answering medical licensing exam questions. In this paper,…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.