Viral News - CybAI news

21 Nov

MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers

stp2y0 CommentsViral News

arXiv:2411.12992v1 Announce Type: new Abstract: In order to reduce the computational complexity of large language models, great efforts have been made to to improve the efficiency of transformer models such as linear attention and flash-attention. However, the model size and corresponding computational complexity are constantly scaled up in pursuit of higher performance. In this work, we present MemoryFormer, a novel transformer architecture which significantly reduces the computational complexity (FLOPs) from a new perspective. We eliminate nearly all the computations of the transformer model except for the necessary computation required by the multi-head attention operation. This is made possible by utilizing…

21 Nov

Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

stp2y0 CommentsViral News

[Submitted on 6 Nov 2024 (v1), last revised 19 Nov 2024 (this version, v2)] View a PDF of the paper titled Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?, by Daniel P. Jeong and 3 other authors View PDF HTML (experimental) Abstract:Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora. These works typically claim that such domain-adaptive pretraining (DAPT) improves performance on downstream medical tasks, such as answering medical licensing exam questions. In this paper,…

21 Nov

Fair Distillation: Teaching Fairness from Biased Teachers in Medical Imaging

stp2y0 CommentsViral News

arXiv:2411.11939v1 Announce Type: new Abstract: Deep learning has achieved remarkable success in image classification and segmentation tasks. However, fairness concerns persist, as models often exhibit biases that disproportionately affect demographic groups defined by sensitive attributes such as race, gender, or age. Existing bias-mitigation techniques, including Subgroup Re-balancing, Adversarial Training, and Domain Generalization, aim to balance accuracy across demographic groups, but often fail to simultaneously improve overall accuracy, group-specific accuracy, and fairness due to conflicts among these interdependent objectives. We propose the Fair Distillation (FairDi) method, a novel fairness approach that decomposes these objectives by leveraging biased ``teacher'' models, each optimized…

21 Nov

Loss-to-Loss Prediction: Scaling Laws for All Datasets

stp2y0 CommentsViral News

[Submitted on 19 Nov 2024] View a PDF of the paper titled Loss-to-Loss Prediction: Scaling Laws for All Datasets, by David Brandfonbrener and 4 other authors View PDF HTML (experimental) Abstract:While scaling laws provide a reliable methodology for predicting train loss across compute scales for a single data distribution, less is known about how these predictions should change as we change the distribution. In this paper, we derive a strategy for predicting one loss from another and apply it to predict across different pre-training datasets and from pre-training data to downstream task data. Our predictions extrapolate well even at 20x…

21 Nov

Trojan Cleansing with Neural Collapse

stp2y0 CommentsViral News

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol

21 Nov

Rethinking cluster-conditioned diffusion models for label-free image synthesis

stp2y0 CommentsViral News

[Submitted on 1 Mar 2024 (v1), last revised 19 Nov 2024 (this version, v2)] View a PDF of the paper titled Rethinking cluster-conditioned diffusion models for label-free image synthesis, by Nikolas Adaloglou and Tim Kaiser and Felix Michels and Markus Kollmann View PDF HTML (experimental) Abstract:Diffusion-based image generation models can enhance image quality when conditioned on ground truth labels. Here, we conduct a comprehensive experimental study on image-level conditioning for diffusion models using cluster assignments. We investigate how individual clustering determinants, such as the number of clusters and the clustering method, impact image synthesis across three different datasets. Given the…

21 Nov

Training Bilingual LMs with Data Constraints in the Targeted Language

stp2y0 CommentsViral News

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol

21 Nov

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

stp2y0 CommentsViral News

[Submitted on 24 Jun 2024 (v1), last revised 20 Nov 2024 (this version, v2)] View a PDF of the paper titled From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models, by Sean Welleck and 7 other authors View PDF Abstract:One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms,…

21 Nov

Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation

stp2y0 CommentsViral News

arXiv:2411.11935v1 Announce Type: new Abstract: Reliable deep learning models require not only accurate predictions but also well-calibrated confidence estimates to ensure dependable uncertainty estimation. This is crucial in safety-critical applications like autonomous driving, which depend on rapid and precise semantic segmentation of LiDAR point clouds for real-time 3D scene understanding. In this work, we introduce a sampling-free approach for estimating well-calibrated confidence values for classification tasks, achieving alignment with true classification accuracy and significantly reducing inference time compared to sampling-based methods. Our evaluation using the Adaptive Calibration Error (ACE) metric for LiDAR semantic segmentation shows that our approach maintains well-calibrated…

21 Nov

Literature Meets Data: A Synergistic Approach to Hypothesis Generation

stp2y0 CommentsViral News

[Submitted on 22 Oct 2024 (v1), last revised 19 Nov 2024 (this version, v2)] View a PDF of the paper titled Literature Meets Data: A Synergistic Approach to Hypothesis Generation, by Haokun Liu and 4 other authors View PDF HTML (experimental) Abstract:AI holds promise for transforming scientific processes, including hypothesis generation. Prior work on hypothesis generation can be broadly categorized into theory-driven and data-driven approaches. While both have proven effective in generating novel and plausible hypotheses, it remains an open question whether they can complement each other. To address this, we develop the first method that combines literature-based insights with…