Viral News - CybAI news

22 Nov

Visual Contexts Clarify Ambiguous Expressions: A Benchmark Dataset

stp2y0 CommentsViral News

arXiv:2411.14137v1 Announce Type: cross Abstract: The ability to perform complex reasoning across multimodal inputs is essential for models to effectively interact with humans in real-world scenarios. Advancements in vision-language models have significantly improved performance on tasks that require processing explicit and direct textual inputs, such as Visual Question Answering (VQA) and Visual Grounding (VG). However, less attention has been given to improving the model capabilities to comprehend nuanced and ambiguous forms of communication. This presents a critical challenge, as human language in real-world interactions often convey hidden intentions that rely on context for accurate interpretation. To address this gap, we…

22 Nov

Brain-Inspired Efficient Pruning: Exploiting Criticality in Spiking Neural Networks

stp2y0 CommentsViral News

[Submitted on 5 Nov 2023 (v1), last revised 21 Nov 2024 (this version, v3)] View a PDF of the paper titled Brain-Inspired Efficient Pruning: Exploiting Criticality in Spiking Neural Networks, by Shuo Chen and 3 other authors View PDF HTML (experimental) Abstract:Spiking Neural Networks (SNNs) have gained significant attention due to the energy-efficient and multiplication-free characteristics. Despite these advantages, deploying large-scale SNNs on edge hardware is challenging due to limited resource availability. Network pruning offers a viable approach to compress the network scale and reduce hardware resource requirements for model deployment. However, existing SNN pruning methods cause high pruning costs…

22 Nov

t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving

stp2y0 CommentsViral News

[Submitted on 13 Oct 2024 (v1), last revised 21 Nov 2024 (this version, v3)] View a PDF of the paper titled t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving, by Pengfei Hu and 7 other authors View PDF HTML (experimental) Abstract:Given the wide adoption of multimodal sensors (e.g., camera, lidar, radar) by autonomous vehicles (AVs), deep analytics to fuse their outputs for a robust perception become imperative. However, existing fusion methods often make two assumptions rarely holding in practice: i) similar data distributions for all inputs and ii) constant availability for all sensors. Because, for example, lidars have…

22 Nov

Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning

stp2y0 CommentsViral News

arXiv:2411.13904v1 Announce Type: new Abstract: How are LLM-based agents used in the future? While many of the existing work on agents has focused on improving the performance of a specific family of objective and challenging tasks, in this work, we take a different perspective by thinking about full delegation: agents take over humans' routine decision-making processes and are trusted by humans to find solutions that fit people's personalized needs and are adaptive to ever-changing context. In order to achieve such a goal, the behavior of the agents, i.e., agentic behaviors, should be evaluated not only on their achievements (i.e., outcome…

22 Nov

A Framework for Evaluating LLMs Under Task Indeterminacy

stp2y0 CommentsViral News

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol

22 Nov

What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality

stp2y0 CommentsViral News

arXiv:2411.13609v1 Announce Type: new Abstract: As video generation models advance rapidly, assessing the quality of generated videos has become increasingly critical. Existing metrics, such as Fr'echet Video Distance (FVD), Inception Score (IS), and ClipSim, measure quality primarily in latent space rather than from a human visual perspective, often overlooking key aspects like appearance and motion consistency to physical laws. In this paper, we propose a novel metric, VAMP (Visual Appearance and Motion Plausibility), that evaluates both the visual appearance and physical plausibility of generated videos. VAMP is composed of two main components: an appearance score, which assesses color, shape, and…

22 Nov

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

stp2y0 CommentsViral News

[Submitted on 1 Nov 2024 (v1), last revised 21 Nov 2024 (this version, v3)] View a PDF of the paper titled Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM, by Xiong Wang and 7 other authors View PDF HTML (experimental) Abstract:Rapidly developing large language models (LLMs) have brought tremendous intelligent applications. Especially, the GPT-4o's excellent duplex speech interaction ability has brought impressive experience to users. Researchers have recently proposed several multi-modal LLMs in this direction that can achieve user-agent speech-to-speech conversations. This paper proposes a novel speech-text multimodal LLM architecture called Freeze-Omni. Our main contribution is…

22 Nov

Hamiltonian Monte Carlo Inference of Marginalized Linear Mixed-Effects Models

stp2y0 CommentsViral News

[Submitted on 31 Oct 2024 (v1), last revised 21 Nov 2024 (this version, v2)] View a PDF of the paper titled Hamiltonian Monte Carlo Inference of Marginalized Linear Mixed-Effects Models, by Jinlin Lai and 2 other authors View PDF HTML (experimental) Abstract:Bayesian reasoning in linear mixed-effects models (LMMs) is challenging and often requires advanced sampling techniques like Markov chain Monte Carlo (MCMC). A common approach is to write the model in a probabilistic programming language and then sample via Hamiltonian Monte Carlo (HMC). However, there are many ways a user can transform a model that make inference more or less…

22 Nov

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

stp2y0 CommentsViral News

[Submitted on 11 Jun 2024 (v1), last revised 20 Nov 2024 (this version, v2)] View a PDF of the paper titled 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models, by Heng Yu and Chaoyang Wang and Peiye Zhuang and Willi Menapace and Aliaksandr Siarohin and Junli Cao and Laszlo A Jeni and Sergey Tulyakov and Hsin-Ying Lee View PDF Abstract:Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations,…

22 Nov

PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation

stp2y0 CommentsViral News

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol