Viral News

Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning

Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning

[Submitted on 6 Oct 2024 (v1), last revised 20 Nov 2024 (this version, v2)] View a PDF of the paper titled Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning, by Quyen Tran and 8 other authors View PDF HTML (experimental) Abstract:Drawing inspiration from human learning behaviors, this work proposes a novel approach to mitigate catastrophic forgetting in Prompt-based Continual Learning models by exploiting the relationships between continuously emerging class data. We find that applying human habits of organizing and connecting information can serve as an efficient strategy when training deep learning models. Specifically, by building a hierarchical tree structure based on…
Read More
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements

ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements

[Submitted on 18 Nov 2024] View a PDF of the paper titled ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements, by M. Arda Ayd{i}n and 4 other authors View PDF HTML (experimental) Abstract:Recent advances in foundational Vision Language Models (VLMs) have reshaped the evaluation paradigm in computer vision tasks. These foundational models, especially CLIP, have accelerated research in open-vocabulary computer vision tasks, including Open-Vocabulary Semantic Segmentation (OVSS). Although the initial results are promising, the dense prediction capabilities of VLMs still require further improvement. In this study, we enhance the semantic segmentation performance of CLIP by introducing new…
Read More
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM

Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM

arXiv:2411.13159v1 Announce Type: new Abstract: Text-to-speech (TTS) models have been widely adopted to enhance automatic speech recognition (ASR) systems using text-only corpora, thereby reducing the cost of labeling real speech data. Existing research primarily utilizes additional text data and predefined speech styles supported by TTS models. In this paper, we propose Hard-Synth, a novel ASR data augmentation method that leverages large language models (LLMs) and advanced zero-shot TTS. Our approach employs LLMs to generate diverse in-domain text through rewriting, without relying on additional text data. Rather than using predefined speech styles, we introduce a hard prompt selection method with zero-shot…
Read More
Adaptive Process-Guided Learning: An Application in Predicting Lake DO Concentrations

Adaptive Process-Guided Learning: An Application in Predicting Lake DO Concentrations

arXiv:2411.12973v1 Announce Type: new Abstract: This paper introduces a textit{Process-Guided Learning (Pril)} framework that integrates physical models with recurrent neural networks (RNNs) to enhance the prediction of dissolved oxygen (DO) concentrations in lakes, which is crucial for sustaining water quality and ecosystem health. Unlike traditional RNNs, which may deliver high accuracy but often lack physical consistency and broad applicability, the textit{Pril} method incorporates differential DO equations for each lake layer, modeling it as a first-order linear solution using a forward Euler scheme with a daily timestep. However, this method is sensitive to numerical instabilities. When drastic fluctuations occur, the numerical…
Read More
Learning Video Representations without Natural Videos

Learning Video Representations without Natural Videos

[Submitted on 31 Oct 2024 (v1), last revised 16 Nov 2024 (this version, v2)] View a PDF of the paper titled Learning Video Representations without Natural Videos, by Xueyang Yu and 2 other authors View PDF HTML (experimental) Abstract:We show that useful video representations can be learned from synthetic videos and natural images, without incorporating natural videos in the training. We propose a progression of video datasets synthesized by simple generative processes, that model a growing set of natural video properties (e.g., motion, acceleration, and shape transformations). The downstream performance of video models pre-trained on these generated datasets gradually increases…
Read More
Keep the Cost Down: A Review on Methods to Optimize LLM’ s KV-Cache Consumption

Keep the Cost Down: A Review on Methods to Optimize LLM’ s KV-Cache Consumption

[Submitted on 25 Jul 2024 (v1), last revised 20 Nov 2024 (this version, v4)] View a PDF of the paper titled Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption, by Luohe Shi and 4 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs), epitomized by ChatGPT's release in late 2022, have revolutionized various industries with their advanced language comprehension. However, their efficiency is challenged by the Transformer architecture's struggle with handling long texts. KV Cache has emerged as a pivotal solution to this issue, converting the time complexity of token generation from quadratic…
Read More
Efficient Model-Stealing Attacks Against Inductive Graph Neural Networks

Efficient Model-Stealing Attacks Against Inductive Graph Neural Networks

[Submitted on 20 May 2024 (v1), last revised 19 Nov 2024 (this version, v4)] View a PDF of the paper titled Efficient Model-Stealing Attacks Against Inductive Graph Neural Networks, by Marcin Podhajski and 5 other authors View PDF HTML (experimental) Abstract:Graph Neural Networks (GNNs) are recognized as potent tools for processing real-world data organized in graph structures. Especially inductive GNNs, which allow for the processing of graph-structured data without relying on predefined graph structures, are becoming increasingly important in a wide range of applications. As such these networks become attractive targets for model-stealing attacks where an adversary seeks to replicate…
Read More
In-Situ Melt Pool Characterization via Thermal Imaging for Defect Detection in Directed Energy Deposition Using Vision Transformers

In-Situ Melt Pool Characterization via Thermal Imaging for Defect Detection in Directed Energy Deposition Using Vision Transformers

arXiv:2411.12028v1 Announce Type: new Abstract: Directed Energy Deposition (DED) offers significant potential for manufacturing complex and multi-material parts. However, internal defects such as porosity and cracks can compromise mechanical properties and overall performance. This study focuses on in-situ monitoring and characterization of melt pools associated with porosity, aiming to improve defect detection and quality control in DED-printed parts. Traditional machine learning approaches for defect identification rely on extensive labeled datasets, often scarce and expensive to generate in real-world manufacturing. To address this, our framework employs self-supervised learning on unlabeled melt pool data using a Vision Transformer-based Masked Autoencoder (MAE) to…
Read More
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding

Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding

arXiv:2411.13157v1 Announce Type: new Abstract: Efficient inference in large language models (LLMs) has become a critical focus as their scale and complexity grow. Traditional autoregressive decoding, while effective, suffers from computational inefficiencies due to its sequential token generation process. Speculative decoding addresses this bottleneck by introducing a two-stage framework: drafting and verification. A smaller, efficient model generates a preliminary draft, which is then refined by a larger, more sophisticated model. This paper provides a comprehensive survey of speculative decoding methods, categorizing them into draft-centric and model-centric approaches. We discuss key ideas associated with each method, highlighting their potential for scaling…
Read More
A Foundation Model for Unified Urban Spatio-Temporal Flow Prediction

A Foundation Model for Unified Urban Spatio-Temporal Flow Prediction

arXiv:2411.12972v1 Announce Type: new Abstract: Urban spatio-temporal flow prediction, encompassing traffic flows and crowd flows, is crucial for optimizing city infrastructure and managing traffic and emergency responses. Traditional approaches have relied on separate models tailored to either grid-based data, representing cities as uniform cells, or graph-based data, modeling cities as networks of nodes and edges. In this paper, we build UniFlow, a foundational model for general urban flow prediction that unifies both grid-based and graphbased data. We first design a multi-view spatio-temporal patching mechanism to standardize different data into a consistent sequential format and then introduce a spatio-temporal transformer architecture…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.