Viral News

TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction

TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction

arXiv:2411.11941v1 Announce Type: new Abstract: Dynamic scene reconstruction is a long-term challenge in 3D vision. Recent methods extend 3D Gaussian Splatting to dynamic scenes via additional deformation fields and apply explicit constraints like motion flow to guide the deformation. However, they learn motion changes from individual timestamps independently, making it challenging to reconstruct complex scenes, particularly when dealing with violent movement, extreme-shaped geometries, or reflective surfaces. To address the above issue, we design a plug-and-play module called TimeFormer to enable existing deformable 3D Gaussians reconstruction methods with the ability to implicitly model motion patterns from a learning perspective. Specifically, TimeFormer…
Read More
Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning

Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning

arXiv:2411.13045v1 Announce Type: cross Abstract: Effective query-item relevance modeling is pivotal for enhancing user experience and safeguarding user satisfaction in e-commerce search systems. Recently, benefiting from the vast inherent knowledge, Large Language Model (LLM) approach demonstrates strong performance and long-tail generalization ability compared with previous neural-based specialized relevance learning methods. Though promising, current LLM-based methods encounter the following inadequacies in practice: First, the massive parameters and computational demands make it difficult to be deployed online. Second, distilling LLM models to online models is a feasible direction, but the LLM relevance modeling is a black box, and its rich intrinsic knowledge…
Read More
SoK: A Systems Perspective on Compound AI Threats and Countermeasures

SoK: A Systems Perspective on Compound AI Threats and Countermeasures

arXiv:2411.13459v1 Announce Type: cross Abstract: Large language models (LLMs) used across enterprises often use proprietary models and operate on sensitive inputs and data. The wide range of attack vectors identified in prior research - targeting various software and hardware components used in training and inference - makes it extremely challenging to enforce confidentiality and integrity policies. As we advance towards constructing compound AI inference pipelines that integrate multiple large language models (LLMs), the attack surfaces expand significantly. Attackers now focus on the AI algorithms as well as the software and hardware components associated with these systems. While current research often…
Read More
Computer-Vision-Enabled Worker Video Analysis for Motion Amount Quantification

Computer-Vision-Enabled Worker Video Analysis for Motion Amount Quantification

[Submitted on 22 May 2024 (v1), last revised 19 Nov 2024 (this version, v2)] View a PDF of the paper titled Computer-Vision-Enabled Worker Video Analysis for Motion Amount Quantification, by Hari Iyer and 3 other authors View PDF HTML (experimental) Abstract:The performance of physical workers is significantly influenced by the extent of their motions. However, monitoring and assessing these motions remains a challenge. Recent advancements have enabled in-situ video analysis for real-time observation of worker behaviors. This paper introduces a novel framework for tracking and quantifying upper and lower limb motions, issuing alerts when critical thresholds are reached. Using joint…
Read More
MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers

MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers

arXiv:2411.12992v1 Announce Type: new Abstract: In order to reduce the computational complexity of large language models, great efforts have been made to to improve the efficiency of transformer models such as linear attention and flash-attention. However, the model size and corresponding computational complexity are constantly scaled up in pursuit of higher performance. In this work, we present MemoryFormer, a novel transformer architecture which significantly reduces the computational complexity (FLOPs) from a new perspective. We eliminate nearly all the computations of the transformer model except for the necessary computation required by the multi-head attention operation. This is made possible by utilizing…
Read More
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

[Submitted on 6 Nov 2024 (v1), last revised 19 Nov 2024 (this version, v2)] View a PDF of the paper titled Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?, by Daniel P. Jeong and 3 other authors View PDF HTML (experimental) Abstract:Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora. These works typically claim that such domain-adaptive pretraining (DAPT) improves performance on downstream medical tasks, such as answering medical licensing exam questions. In this paper,…
Read More
Fair Distillation: Teaching Fairness from Biased Teachers in Medical Imaging

Fair Distillation: Teaching Fairness from Biased Teachers in Medical Imaging

arXiv:2411.11939v1 Announce Type: new Abstract: Deep learning has achieved remarkable success in image classification and segmentation tasks. However, fairness concerns persist, as models often exhibit biases that disproportionately affect demographic groups defined by sensitive attributes such as race, gender, or age. Existing bias-mitigation techniques, including Subgroup Re-balancing, Adversarial Training, and Domain Generalization, aim to balance accuracy across demographic groups, but often fail to simultaneously improve overall accuracy, group-specific accuracy, and fairness due to conflicts among these interdependent objectives. We propose the Fair Distillation (FairDi) method, a novel fairness approach that decomposes these objectives by leveraging biased ``teacher'' models, each optimized…
Read More
Loss-to-Loss Prediction: Scaling Laws for All Datasets

Loss-to-Loss Prediction: Scaling Laws for All Datasets

[Submitted on 19 Nov 2024] View a PDF of the paper titled Loss-to-Loss Prediction: Scaling Laws for All Datasets, by David Brandfonbrener and 4 other authors View PDF HTML (experimental) Abstract:While scaling laws provide a reliable methodology for predicting train loss across compute scales for a single data distribution, less is known about how these predictions should change as we change the distribution. In this paper, we derive a strategy for predicting one loss from another and apply it to predict across different pre-training datasets and from pre-training data to downstream task data. Our predictions extrapolate well even at 20x…
Read More
Trojan Cleansing with Neural Collapse

Trojan Cleansing with Neural Collapse

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
Rethinking cluster-conditioned diffusion models for label-free image synthesis

Rethinking cluster-conditioned diffusion models for label-free image synthesis

[Submitted on 1 Mar 2024 (v1), last revised 19 Nov 2024 (this version, v2)] View a PDF of the paper titled Rethinking cluster-conditioned diffusion models for label-free image synthesis, by Nikolas Adaloglou and Tim Kaiser and Felix Michels and Markus Kollmann View PDF HTML (experimental) Abstract:Diffusion-based image generation models can enhance image quality when conditioned on ground truth labels. Here, we conduct a comprehensive experimental study on image-level conditioning for diffusion models using cluster assignments. We investigate how individual clustering determinants, such as the number of clusters and the clustering method, impact image synthesis across three different datasets. Given the…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.