Viral News

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

arXiv:2411.14257v1 Announce Type: new Abstract: Hallucinations in large language models are a widespread problem, yet the mechanisms behind whether models will hallucinate are poorly understood, limiting our ability to solve this problem. Using sparse autoencoders as an interpretability tool, we discover that a key part of these mechanisms is entity recognition, where the model detects if an entity is one it can recall facts about. Sparse autoencoders uncover meaningful directions in the representation space, these detect whether the model recognizes an entity, e.g. detecting it doesn't know about an athlete or a movie. This suggests that models can have self-knowledge:…
Read More
Schemato — An LLM for Netlist-to-Schematic Conversion

Schemato — An LLM for Netlist-to-Schematic Conversion

arXiv:2411.13899v1 Announce Type: new Abstract: Machine learning models are advancing circuit design, particularly in analog circuits. They typically generate netlists that lack human interpretability. This is a problem as human designers heavily rely on the interpretability of circuit diagrams or schematics to intuitively understand, troubleshoot, and develop designs. Hence, to integrate domain knowledge effectively, it is crucial to translate ML-generated netlists into interpretable schematics quickly and accurately. We propose Schemato, a large language model (LLM) for netlist-to-schematic conversion. In particular, we consider our approach in the two settings of converting netlists to .asc files for LTSpice and LATEX files for…
Read More
Extending Video Masked Autoencoders to 128 frames

Extending Video Masked Autoencoders to 128 frames

[Submitted on 20 Nov 2024] Authors:Nitesh Bharadwaj Gundavarapu, Luke Friedman, Raghav Goyal, Chaitra Hegde, Eirikur Agustsson, Sagar M. Waghmare, Mikhail Sirotenko, Ming-Hsuan Yang, Tobias Weyand, Boqing Gong, Leonid Sigal View a PDF of the paper titled Extending Video Masked Autoencoders to 128 frames, by Nitesh Bharadwaj Gundavarapu and 10 other authors View PDF HTML (experimental) Abstract:Video understanding has witnessed significant progress with recent video foundation models demonstrating strong performance owing to self-supervised pre-training objectives; Masked Autoencoders (MAE) being the design of choice. Nevertheless, the majority of prior works that leverage MAE pre-training have focused on relatively short video representations (16…
Read More
BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection

BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection

arXiv:2411.14100v1 Announce Type: cross Abstract: Spoken term detection (STD) is often hindered by reliance on frame-level features and the computationally intensive DTW-based template matching, limiting its practicality. To address these challenges, we propose a novel approach that encodes speech into discrete, speaker-agnostic semantic tokens. This facilitates fast retrieval using text-based search algorithms and effectively handles out-of-vocabulary terms. Our approach focuses on generating consistent token sequences across varying utterances of the same term. We also propose a bidirectional state space modeling within the Mamba encoder, trained in a self-supervised learning framework, to learn contextual frame-level features that are further encoded into…
Read More
Rethinking Weight-Averaged Model-merging

Rethinking Weight-Averaged Model-merging

[Submitted on 14 Nov 2024 (v1), last revised 21 Nov 2024 (this version, v2)] View a PDF of the paper titled Rethinking Weight-Averaged Model-merging, by Hu Wang and 5 other authors View PDF HTML (experimental) Abstract:Weight-averaged model-merging has emerged as a powerful approach in deep learning, capable of enhancing model performance without fine-tuning or retraining. However, the underlying mechanisms that explain its effectiveness remain largely unexplored. In this paper, we investigate this technique from three novel perspectives to provide deeper insights into how and why weight-averaged model-merging works: (1) we examine the intrinsic patterns captured by the learning of the…
Read More
OmniGen: Unified Image Generation

OmniGen: Unified Image Generation

[Submitted on 17 Sep 2024 (v1), last revised 21 Nov 2024 (this version, v2)] View a PDF of the paper titled OmniGen: Unified Image Generation, by Shitao Xiao and 9 other authors View PDF HTML (experimental) Abstract:The emergence of Large Language Models (LLMs) has unified language generation tasks and revolutionized human-machine interaction. However, in the realm of image generation, a unified model capable of handling various tasks within a single framework remains largely unexplored. In this work, we introduce OmniGen, a new diffusion model for unified image generation. OmniGen is characterized by the following features: 1) Unification: OmniGen not only…
Read More
Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification

Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification

arXiv:2411.14252v1 Announce Type: new Abstract: Generating large-scale, domain-specific, multilingual multi-turn dialogue datasets remains a significant hurdle for training effective Multi-Turn Intent Classification models in chatbot systems. In this paper, we introduce Chain-of-Intent, a novel mechanism that combines Hidden Markov Models with Large Language Models (LLMs) to generate contextually aware, intent-driven conversations through self-play. By extracting domain-specific knowledge from e-commerce chat logs, we estimate conversation turns and intent transitions, which guide the generation of coherent dialogues. Leveraging LLMs to enhance emission probabilities, our approach produces natural and contextually consistent questions and answers. We also propose MINT-CL, a framework for multi-turn intent…
Read More
GraCo — A Graph Composer for Integrated Circuits

GraCo — A Graph Composer for Integrated Circuits

arXiv:2411.13890v1 Announce Type: new Abstract: Designing integrated circuits involves substantial complexity, posing challenges in revealing its potential applications - from custom digital cells to analog circuits. Despite extensive research over the past decades in building versatile and automated frameworks, there remains open room to explore more computationally efficient AI-based solutions. This paper introduces the graph composer GraCo, a novel method for synthesizing integrated circuits using reinforcement learning (RL). GraCo learns to construct a graph step-by-step, which is then converted into a netlist and simulated with SPICE. We demonstrate that GraCo is highly configurable, enabling the incorporation of prior design knowledge…
Read More
FabuLight-ASD: Unveiling Speech Activity via Body Language

FabuLight-ASD: Unveiling Speech Activity via Body Language

arXiv:2411.13674v1 Announce Type: new Abstract: Active speaker detection (ASD) in multimodal environments is crucial for various applications, from video conferencing to human-robot interaction. This paper introduces FabuLight-ASD, an advanced ASD model that integrates facial, audio, and body pose information to enhance detection accuracy and robustness. Our model builds upon the existing Light-ASD framework by incorporating human pose data, represented through skeleton graphs, which minimises computational overhead. Using the Wilder Active Speaker Detection (WASD) dataset, renowned for reliable face and body bounding box annotations, we demonstrate FabuLight-ASD's effectiveness in real-world scenarios. Achieving an overall mean average precision (mAP) of 94.3%, FabuLight-ASD…
Read More
Chat Bankman-Fried: an Exploration of LLM Alignment in Finance

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance

[Submitted on 1 Nov 2024 (v1), last revised 21 Nov 2024 (this version, v2)] View a PDF of the paper titled Chat Bankman-Fried: an Exploration of LLM Alignment in Finance, by Claudia Biancotti and 4 other authors View PDF HTML (experimental) Abstract:Advancements in large language models (LLMs) have renewed concerns about AI alignment - the consistency between human and AI goals and values. As various jurisdictions enact legislation on AI safety, the concept of alignment must be defined and measured across different domains. This paper proposes an experimental framework to assess whether LLMs adhere to ethical and legal standards in…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.