Unveiling and Controlling Anomalous Attention Distribution in Transformers

Unveiling and Controlling Anomalous Attention Distribution in Transformers

arXiv:2407.01601v1 Announce Type: new Abstract: With the advent of large models based on the Transformer architecture, researchers have observed an anomalous phenomenon in the Attention mechanism--there is a very high attention on the first element, which is prevalent across Transformer-based models. It is crucial to understand it for the development of techniques focusing on attention distribution, such as Key-Value (KV) Cache compression and infinite extrapolation; however, the latent cause leaves to be unknown. In this paper, we analyze such a phenomenon from the perspective of waiver phenomenon, which involves reducing the internal values of certain elements in the Softmax function,…
Read More
Optimized Learning for X-Ray Image Classification for Multi-Class Disease Diagnoses with Accelerated Computing Strategies

Optimized Learning for X-Ray Image Classification for Multi-Class Disease Diagnoses with Accelerated Computing Strategies

[Submitted on 1 Jul 2024] View a PDF of the paper titled Optimized Learning for X-Ray Image Classification for Multi-Class Disease Diagnoses with Accelerated Computing Strategies, by Sebastian A. Cruz Romero and 3 other authors View PDF Abstract:X-ray image-based disease diagnosis lies in ensuring the precision of identifying afflictions within the sample, a task fraught with challenges stemming from the occurrence of false positives and false negatives. False positives introduce the risk of erroneously identifying non-existent conditions, leading to misdiagnosis and a decline in patient care quality. Conversely, false negatives pose the threat of overlooking genuine abnormalities, potentially causing delays…
Read More
Japan’s government says goodbye to floppy disks

Japan’s government says goodbye to floppy disks

Floppy disks may seem like a relic from an ancient time of computers but there are still places and even governments in the world that still use them to run its most basic functions. Japan is no longer one of those countries.Japan’s Digital Agency announced on Wednesday it has rid its use of outdated floppy disks to operate its government computer systems. The only system still in place that requires the use of floppy disks is an environmental system that monitors vehicle recycling, according to Reuters.Digital Minister Taro Kono declared in a statement to the news agency, “We have won…
Read More
NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers

NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
Two luxury rivals are joining forces — and Amazon is getting in on the action

Two luxury rivals are joining forces — and Amazon is getting in on the action

Amazon — which has long tried to boost its luxury offerings as part of its "everything store" concept — and Salesforce are getting in on it, too, with both taking minority stakes in the new company, Saks Global. The pair will provide technology and logistics support to the latest luxury giant, the Journal said.As e-commerce and the power of luxury conglomerates like LVMH and Kering have grown, department stores are facing diminishing returns. In 2020, Lord & Taylor filed for bankruptcy. Macy's announced in February that it would be closing 150 stores over the next three years. In a way,…
Read More
Long-Term Prediction Accuracy Improvement of Data-Driven Medium-Range Global Weather Forecast

Long-Term Prediction Accuracy Improvement of Data-Driven Medium-Range Global Weather Forecast

arXiv:2407.01598v1 Announce Type: new Abstract: Long-term stability stands as a crucial requirement in data-driven medium-range global weather forecasting. Spectral bias is recognized as the primary contributor to instabilities, as data-driven methods difficult to learn small-scale dynamics. In this paper, we reveal that the universal mechanism for these instabilities is not only related to spectral bias but also to distortions brought by processing spherical data using conventional convolution. These distortions lead to a rapid amplification of errors over successive long-term iterations, resulting in a significant decline in forecast accuracy. To address this issue, a universal neural operator called the Spherical Harmonic…
Read More
SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving

SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving

arXiv:2407.01702v1 Announce Type: new Abstract: Scene flow estimation predicts the 3D motion at each point in successive LiDAR scans. This detailed, point-level, information can help autonomous vehicles to accurately predict and understand dynamic changes in their surroundings. Current state-of-the-art methods require annotated data to train scene flow networks and the expense of labeling inherently limits their scalability. Self-supervised approaches can overcome the above limitations, yet face two principal challenges that hinder optimal performance: point distribution imbalance and disregard for object-level motion constraints. In this paper, we propose SeFlow, a self-supervised method that integrates efficient dynamic classification into a learning-based scene…
Read More
Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning

Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning

arXiv:2407.01687v1 Announce Type: new Abstract: Chain-of-Thought (CoT) prompting has been shown to enhance the multi-step reasoning capabilities of Large Language Models (LLMs). However, debates persist about whether LLMs exhibit abstract generalization or rely on shallow heuristics when given CoT prompts. To understand the factors influencing CoT reasoning we provide a detailed case study of the symbolic reasoning task of decoding shift ciphers, where letters are shifted forward some number of steps in the alphabet. GPT-4 achieves zero accuracy on most shift ciphers with standard prompting, but with CoT its accuracy improves to an average of 32%. By focusing on a…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.