Viral News

On margin-based generalization prediction in deep neural networks

On margin-based generalization prediction in deep neural networks

[Submitted on 20 May 2024] View a PDF of the paper titled On margin-based generalization prediction in deep neural networks, by Coenraad Mouton View PDF Abstract:Understanding generalization in deep neural networks is an active area of research. A promising avenue of exploration has been that of margin measurements: the shortest distance to the decision boundary for a given sample or that sample's representation internal to the network. Margin-based complexity measures have been shown to be correlated with the generalization ability of deep neural networks in some circumstances but not others. The reasons behind the success or failure of these metrics…
Read More
How to train your ViT for OOD Detection

How to train your ViT for OOD Detection

arXiv:2405.17447v1 Announce Type: new Abstract: VisionTransformers have been shown to be powerful out-of-distribution detectors for ImageNet-scale settings when finetuned from publicly available checkpoints, often outperforming other model types on popular benchmarks. In this work, we investigate the impact of both the pretraining and finetuning scheme on the performance of ViTs on this task by analyzing a large pool of models. We find that the exact type of pretraining has a strong impact on which method works well and on OOD detection performance in general. We further show that certain training schemes might only be effective for a specific type of…
Read More
Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention

Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention

arXiv:2405.16042v1 Announce Type: new Abstract: When reading temporarily ambiguous garden-path sentences, misinterpretations sometimes linger past the point of disambiguation. This phenomenon has traditionally been studied in psycholinguistic experiments using online measures such as reading times and offline measures such as comprehension questions. Here, we investigate the processing of garden-path sentences and the fate of lingering misinterpretations using four large language models (LLMs): GPT-2, LLaMA-2, Flan-T5, and RoBERTa. The overall goal is to evaluate whether humans and LLMs are aligned in their processing of garden-path sentences and in the lingering misinterpretations past the point of disambiguation, especially when extra-syntactic information (e.g.,…
Read More
Building a Strong AI Foundation: The Critical Role of High-Quality Data

Building a Strong AI Foundation: The Critical Role of High-Quality Data

Whether it's manufacturing and supply chain management or the healthcare industry, Artificial Intelligence (AI) has the power to revolutionize operations. AI holds the power to boost efficiency, personalize customer experiences and spark innovation.  That said, getting reliable, actionable results from any AI process hinges on the quality of data it is fed. Let's take a closer look at what's needed to prepare your data for AI-driven success. How Does Data Quality Impact AI Systems? Using poor quality data can result in expensive, embarrassing mistakes like the time Air Canada‘s chatbot gave a grieving customer incorrect information. In areas like healthcare, using…
Read More
CataLM: Empowering Catalyst Design Through Large Language Models

CataLM: Empowering Catalyst Design Through Large Language Models

arXiv:2405.17440v1 Announce Type: new Abstract: The field of catalysis holds paramount importance in shaping the trajectory of sustainable development, prompting intensive research efforts to leverage artificial intelligence (AI) in catalyst design. Presently, the fine-tuning of open-source large language models (LLMs) has yielded significant breakthroughs across various domains such as biology and healthcare. Drawing inspiration from these advancements, we introduce CataLM Cata}lytic Language Model), a large language model tailored to the domain of electrocatalytic materials. Our findings demonstrate that CataLM exhibits remarkable potential for facilitating human-AI collaboration in catalyst knowledge exploration and design. To the best of our knowledge, CataLM stands…
Read More
Towards Gradient-based Time-Series Explanations through a SpatioTemporal Attention Network

Towards Gradient-based Time-Series Explanations through a SpatioTemporal Attention Network

arXiv:2405.17444v1 Announce Type: new Abstract: In this paper, we explore the feasibility of using a transformer-based, spatiotemporal attention network (STAN) for gradient-based time-series explanations. First, we trained the STAN model for video classifications using the global and local views of data and weakly supervised labels on time-series data (i.e. the type of an activity). We then leveraged a gradient-based XAI technique (e.g. saliency map) to identify salient frames of time-series data. According to the experiments using the datasets of four medically relevant activities, the STAN model demonstrated its potential to identify important frames of videos. Source link lol
Read More
Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models

Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models

arXiv:2405.15984v1 Announce Type: new Abstract: With the emergence of large language models, such as LLaMA and OpenAI GPT-3, In-Context Learning (ICL) gained significant attention due to its effectiveness and efficiency. However, ICL is very sensitive to the choice, order, and verbaliser used to encode the demonstrations in the prompt. Retrieval-Augmented ICL methods try to address this problem by leveraging retrievers to extract semantically related examples as demonstrations. While this approach yields more accurate results, its robustness against various types of adversarial attacks, including perturbations on test samples, demonstrations, and retrieved data, remains under-explored. Our study reveals that retrieval-augmented models can…
Read More
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications

Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications

arXiv:2405.15877v1 Announce Type: new Abstract: Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as personal computers and mobile/wearable devices, and results in substantial inference costs in resource-rich environments like cloud servers. To extend the use of LLMs, we introduce a low-rank decomposition approach to effectively compress these models, tailored to the requirements of specific applications. We observe that LLMs pretrained on general datasets contain many redundant components not needed for particular applications. Our method focuses on identifying…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.