Viral News

Self-Supervised Learning for Multi-Channel Neural Transducer

Self-Supervised Learning for Multi-Channel Neural Transducer

arXiv:2408.02945v1 Announce Type: new Abstract: Self-supervised learning, such as with the wav2vec 2.0 framework significantly improves the accuracy of end-to-end automatic speech recognition (ASR). Wav2vec 2.0 has been applied to single-channel end-to-end ASR models. In this work, we explored a self-supervised learning method for a multi-channel end-to-end ASR model based on the wav2vec 2.0 framework. As the multi-channel end-to-end ASR model, we focused on a multi-channel neural transducer. In pre-training, we compared three different methods for feature quantization to train a multi-channel conformer audio encoder: joint quantization, feature-wise quantization and channel-wise quantization. In fine-tuning, we trained the multi-channel conformer-transducer. All…
Read More
Artificial Neural Networks for Photonic Applications: From Algorithms to Implementation

Artificial Neural Networks for Photonic Applications: From Algorithms to Implementation

[Submitted on 2 Aug 2024] View a PDF of the paper titled Artificial Neural Networks for Photonic Applications: From Algorithms to Implementation, by Pedro Freire and 2 other authors View PDF HTML (experimental) Abstract:This tutorial-review on applications of artificial neural networks in photonics targets a broad audience, ranging from optical research and engineering communities to computer science and applied mathematics. We focus here on the research areas at the interface between these disciplines, attempting to find the right balance between technical details specific to each domain and overall clarity. First, we briefly recall key properties and peculiarities of some core…
Read More
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation

From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation

arXiv:2408.02769v1 Announce Type: new Abstract: The action anticipation task refers to predicting what action will happen based on observed videos, which requires the model to have a strong ability to summarize the present and then reason about the future. Experience and common sense suggest that there is a significant correlation between different actions, which provides valuable prior knowledge for the action anticipation task. However, previous methods have not effectively modeled this underlying statistical relationship. To address this issue, we propose a novel end-to-end video modeling architecture that utilizes attention mechanisms, named Anticipation via Recognition and Reasoning (ARR). ARR decomposes the…
Read More
Intermediate direct preference optimization

Intermediate direct preference optimization

arXiv:2408.02923v1 Announce Type: new Abstract: We propose the intermediate direct preference optimization (DPO) method to calculate the DPO loss at selected intermediate layers as an auxiliary loss for finetuning large language models (LLMs). The conventional DPO method fine-tunes a supervised fine-tuning (SFT) model by calculating the DPO loss using logits from the final layer. In our intermediate DPO approach, DPO losses are calculated using the logits from K-selected intermediate layers and averaged to obtain the intermediate DPO loss. For training the intermediate DPO model, the final loss is obtained by calculating the weighted sum of the DPO and intermediate DPO…
Read More
Open Set Recognition for Random Forest

Open Set Recognition for Random Forest

arXiv:2408.02684v1 Announce Type: new Abstract: In many real-world classification or recognition tasks, it is often difficult to collect training examples that exhaust all possible classes due to, for example, incomplete knowledge during training or ever changing regimes. Therefore, samples from unknown/novel classes may be encountered in testing/deployment. In such scenarios, the classifiers should be able to i) perform classification on known classes, and at the same time, ii) identify samples from unknown classes. This is known as open-set recognition. Although random forest has been an extremely successful framework as a general-purpose classification (and regression) method, in practice, it usually operates…
Read More
ConDL: Detector-Free Dense Image Matching

ConDL: Detector-Free Dense Image Matching

[Submitted on 5 Aug 2024] View a PDF of the paper titled ConDL: Detector-Free Dense Image Matching, by Monika Kwiatkowski and 2 other authors View PDF HTML (experimental) Abstract:In this work, we introduce a deep-learning framework designed for estimating dense image correspondences. Our fully convolutional model generates dense feature maps for images, where each pixel is associated with a descriptor that can be matched across multiple images. Unlike previous methods, our model is trained on synthetic data that includes significant distortions, such as perspective changes, illumination variations, shadows, and specular highlights. Utilizing contrastive learning, our feature maps achieve greater invariance…
Read More
Data Checklist: On Unit-Testing Datasets with Usable Information

Data Checklist: On Unit-Testing Datasets with Usable Information

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
Improving Machine Learning Based Sepsis Diagnosis Using Heart Rate Variability

Improving Machine Learning Based Sepsis Diagnosis Using Heart Rate Variability

arXiv:2408.02683v1 Announce Type: new Abstract: The early and accurate diagnosis of sepsis is critical for enhancing patient outcomes. This study aims to use heart rate variability (HRV) features to develop an effective predictive model for sepsis detection. Critical HRV features are identified through feature engineering methods, including statistical bootstrapping and the Boruta algorithm, after which XGBoost and Random Forest classifiers are trained with differential hyperparameter settings. In addition, ensemble models are constructed to pool the prediction probabilities of high-recall and high-precision classifiers and improve model performance. Finally, a neural network model is trained on the HRV features, achieving an F1…
Read More
Dimensionality Reduction and Nearest Neighbors for Improving Out-of-Distribution Detection in Medical Image Segmentation

Dimensionality Reduction and Nearest Neighbors for Improving Out-of-Distribution Detection in Medical Image Segmentation

arXiv:2408.02761v1 Announce Type: new Abstract: Clinically deployed deep learning-based segmentation models are known to fail on data outside of their training distributions. While clinicians review the segmentations, these models tend to perform well in most instances, which could exacerbate automation bias. Therefore, detecting out-of-distribution images at inference is critical to warn the clinicians that the model likely failed. This work applied the Mahalanobis distance (MD) post hoc to the bottleneck features of four Swin UNETR and nnU-net models that segmented the liver on T1-weighted magnetic resonance imaging and computed tomography. By reducing the dimensions of the bottleneck features with either…
Read More
Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering

Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering

arXiv:2408.02907v1 Announce Type: new Abstract: Retrieving external knowledge and prompting large language models with relevant information is an effective paradigm to enhance the performance of question-answering tasks. Previous research typically handles paragraphs from external documents in isolation, resulting in a lack of context and ambiguous references, particularly in multi-document and complex tasks. To overcome these challenges, we propose a new retrieval framework IIER, that leverages Inter-chunk Interactions to Enhance Retrieval. This framework captures the internal connections between document chunks by considering three types of interactions: structural, keyword, and semantic. We then construct a unified Chunk-Interaction Graph to represent all external…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.