Viral News

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software

AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software

[Submitted on 19 Nov 2024] View a PDF of the paper titled AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software, by Nigar Alishzade and 1 other authors View PDF HTML (experimental) Abstract:Sign language processing technology development relies on extensive and reliable datasets, instructions, and ethical guidelines. We present a comprehensive Azerbaijani Sign Language Dataset (AzSLD) collected from diverse sign language users and linguistic parameters to facilitate advancements in sign recognition and translation systems and support the local sign language community. The dataset was created within the framework of a vision-based AzSL translation project. This…
Read More
Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration

Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration

[Submitted on 25 Dec 2023 (v1), last revised 20 Nov 2024 (this version, v2)] View a PDF of the paper titled Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration, by Jiahong Fu and 2 other authors View PDF HTML (experimental) Abstract:The deep unfolding approach has attracted significant attention in computer vision tasks, which well connects conventional image processing modeling manners with more recent deep learning techniques. Specifically, by establishing a direct correspondence between algorithm operators at each implementation step and network modules within each layer, one can rationally construct an almost ``white box'' network architecture with high…
Read More
SatDiffMoE: A Mixture of Estimation Method for Satellite Image Super-resolution with Latent Diffusion Models

SatDiffMoE: A Mixture of Estimation Method for Satellite Image Super-resolution with Latent Diffusion Models

[Submitted on 14 Jun 2024 (v1), last revised 18 Nov 2024 (this version, v2)] View a PDF of the paper titled SatDiffMoE: A Mixture of Estimation Method for Satellite Image Super-resolution with Latent Diffusion Models, by Zhaoxu Luo and 2 other authors View PDF HTML (experimental) Abstract:During the acquisition of satellite images, there is generally a trade-off between spatial resolution and temporal resolution (acquisition frequency) due to the onboard sensors of satellite imaging systems. High-resolution satellite images are very important for land crop monitoring, urban planning, wildfire management and a variety of applications. It is a significant yet challenging task…
Read More
LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

arXiv:2411.13009v1 Announce Type: cross Abstract: As large language models (LLMs) show impressive performance on complex tasks, they still struggle with longer contextual understanding and high computational costs. To balance efficiency and quality, we introduce LLMSteer, a fine-tuning-free framework that enhances LLMs through query-independent attention steering. Tested on popular LLMs and datasets, LLMSteer narrows the performance gap with baselines by 65.9% and reduces the runtime delay by up to 4.8x compared to recent attention steering methods. Source link lol
Read More
Selective Attention: Enhancing Transformer through Principled Context Control

Selective Attention: Enhancing Transformer through Principled Context Control

arXiv:2411.12892v1 Announce Type: new Abstract: The attention mechanism within the transformer architecture enables the model to weigh and combine tokens based on their relevance to the query. While self-attention has enjoyed major success, it notably treats all queries $q$ in the same way by applying the mapping $V^toptext{softmax}(Kq)$, where $V,K$ are the value and key embeddings respectively. In this work, we argue that this uniform treatment hinders the ability to control contextual sparsity and relevance. As a solution, we introduce the $textit{Selective Self-Attention}$ (SSA) layer that augments the softmax nonlinearity with a principled temperature scaling strategy. By controlling temperature, SSA…
Read More
$text{S}^{3}$Mamba: Arbitrary-Scale Super-Resolution via Scaleable State Space Model

$text{S}^{3}$Mamba: Arbitrary-Scale Super-Resolution via Scaleable State Space Model

arXiv:2411.11906v1 Announce Type: new Abstract: Arbitrary scale super-resolution (ASSR) aims to super-resolve low-resolution images to high-resolution images at any scale using a single model, addressing the limitations of traditional super-resolution methods that are restricted to fixed-scale factors (e.g., $times2$, $times4$). The advent of Implicit Neural Representations (INR) has brought forth a plethora of novel methodologies for ASSR, which facilitate the reconstruction of original continuous signals by modeling a continuous representation space for coordinates and pixel values, thereby enabling arbitrary-scale super-resolution. Consequently, the primary objective of ASSR is to construct a continuous representation space derived from low-resolution inputs. However, existing methods,…
Read More
ProSec: Fortifying Code LLMs with Proactive Security Alignment

ProSec: Fortifying Code LLMs with Proactive Security Alignment

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
On the Way to LLM Personalization: Learning to Remember User Conversations

On the Way to LLM Personalization: Learning to Remember User Conversations

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.