Viral News

Selective Attention: Enhancing Transformer through Principled Context Control

Selective Attention: Enhancing Transformer through Principled Context Control

arXiv:2411.12892v1 Announce Type: new Abstract: The attention mechanism within the transformer architecture enables the model to weigh and combine tokens based on their relevance to the query. While self-attention has enjoyed major success, it notably treats all queries $q$ in the same way by applying the mapping $V^toptext{softmax}(Kq)$, where $V,K$ are the value and key embeddings respectively. In this work, we argue that this uniform treatment hinders the ability to control contextual sparsity and relevance. As a solution, we introduce the $textit{Selective Self-Attention}$ (SSA) layer that augments the softmax nonlinearity with a principled temperature scaling strategy. By controlling temperature, SSA…
Read More
$text{S}^{3}$Mamba: Arbitrary-Scale Super-Resolution via Scaleable State Space Model

$text{S}^{3}$Mamba: Arbitrary-Scale Super-Resolution via Scaleable State Space Model

arXiv:2411.11906v1 Announce Type: new Abstract: Arbitrary scale super-resolution (ASSR) aims to super-resolve low-resolution images to high-resolution images at any scale using a single model, addressing the limitations of traditional super-resolution methods that are restricted to fixed-scale factors (e.g., $times2$, $times4$). The advent of Implicit Neural Representations (INR) has brought forth a plethora of novel methodologies for ASSR, which facilitate the reconstruction of original continuous signals by modeling a continuous representation space for coordinates and pixel values, thereby enabling arbitrary-scale super-resolution. Consequently, the primary objective of ASSR is to construct a continuous representation space derived from low-resolution inputs. However, existing methods,…
Read More
ProSec: Fortifying Code LLMs with Proactive Security Alignment

ProSec: Fortifying Code LLMs with Proactive Security Alignment

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
On the Way to LLM Personalization: Learning to Remember User Conversations

On the Way to LLM Personalization: Learning to Remember User Conversations

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

[Submitted on 2 May 2023 (v1), last revised 20 Nov 2024 (this version, v3)] View a PDF of the paper titled Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks, by Gav{s}per Beguv{s} and Thomas Lu and Zili Wang View PDF HTML (experimental) Abstract:Computational models of syntax are predominantly text-based. Here we propose that the most basic first step in the evolution of syntax can be modeled directly from raw speech in a fully unsupervised way. We focus on one of the most ubiquitous and elementary suboperation of syntax -- concatenation. We introduce spontaneous concatenation: a phenomenon where…
Read More
Regional Ocean Forecasting with Hierarchical Graph Neural Networks

Regional Ocean Forecasting with Hierarchical Graph Neural Networks

[Submitted on 15 Oct 2024 (v1), last revised 20 Nov 2024 (this version, v2)] View a PDF of the paper titled Regional Ocean Forecasting with Hierarchical Graph Neural Networks, by Daniel Holmberg and 2 other authors View PDF HTML (experimental) Abstract:Accurate ocean forecasting systems are vital for understanding marine dynamics, which play a crucial role in environmental management and climate adaptation strategies. Traditional numerical solvers, while effective, are computationally expensive and time-consuming. Recent advancements in machine learning have revolutionized weather forecasting, offering fast and energy-efficient alternatives. Building on these advancements, we introduce SeaCast, a neural network designed for high-resolution, medium-range…
Read More
F$^3$OCUS — Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics

F$^3$OCUS — Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
Executable QR codes with Machine Learning for Industrial Applications

Executable QR codes with Machine Learning for Industrial Applications

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
Generalization on the Unseen, Logic Reasoning and Degree Curriculum

Generalization on the Unseen, Logic Reasoning and Degree Curriculum

[Submitted on 30 Jan 2023 (v1), last revised 20 Nov 2024 (this version, v3)] View a PDF of the paper titled Generalization on the Unseen, Logic Reasoning and Degree Curriculum, by Emmanuel Abbe and 3 other authors View PDF HTML (experimental) Abstract:This paper considers the learning of logical (Boolean) functions with a focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.