Viral News

Fairness in Autonomous Driving: Towards Understanding Confounding Factors in Object Detection under Challenging Weather

Fairness in Autonomous Driving: Towards Understanding Confounding Factors in Object Detection under Challenging Weather

[Submitted on 31 May 2024] View a PDF of the paper titled Fairness in Autonomous Driving: Towards Understanding Confounding Factors in Object Detection under Challenging Weather, by Bimsara Pathiraja and 2 other authors View PDF HTML (experimental) Abstract:The deployment of autonomous vehicles (AVs) is rapidly expanding to numerous cities. At the heart of AVs, the object detection module assumes a paramount role, directly influencing all downstream decision-making tasks by considering the presence of nearby pedestrians, vehicles, and more. Despite high accuracy of pedestrians detected on held-out datasets, the potential presence of algorithmic bias in such object detectors, particularly in challenging…
Read More
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning

CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning

arXiv:2406.00021v1 Announce Type: new Abstract: This paper presents CrossVoice, a novel cascade-based Speech-to-Speech Translation (S2ST) system employing advanced ASR, MT, and TTS technologies with cross-lingual prosody preservation through transfer learning. We conducted comprehensive experiments comparing CrossVoice with direct-S2ST systems, showing improved BLEU scores on tasks such as Fisher Es-En, VoxPopuli Fr-En and prosody preservation on benchmark datasets CVSS-T and IndicTTS. With an average mean opinion score of 3.75 out of 4, speech synthesized by CrossVoice closely rivals human speech on the benchmark, highlighting the efficacy of cascade-based systems and transfer learning in multilingual S2ST with prosody transfer. Source link lol
Read More
From Structured to Unstructured:A Comparative Analysis of Computer Vision and Graph Models in solving Mesh-based PDEs

From Structured to Unstructured:A Comparative Analysis of Computer Vision and Graph Models in solving Mesh-based PDEs

arXiv:2406.00081v1 Announce Type: new Abstract: This article investigates the application of computer vision and graph-based models in solving mesh-based partial differential equations within high-performance computing environments. Focusing on structured, graded structured, and unstructured meshes, the study compares the performance and computational efficiency of three computer vision-based models against three graph-based models across three data-sets. The research aims to identify the most suitable models for different mesh topographies, particularly highlighting the exploration of graded meshes, a less studied area. Results demonstrate that computer vision-based models, notably U-Net, outperform the graph models in prediction performance and efficiency in two (structured and graded)…
Read More
A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies

A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies

arXiv:2406.00210v1 Announce Type: new Abstract: The Stable Diffusion Model (SDM) is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation. Despite various attempts at sampler optimization, model distillation, and network quantification, these approaches typically maintain the original network architecture. The extensive parameter scale and substantial computational demands have limited research into adjusting the model architecture. 1) For the tuning method, we design a model assembly strategy to reconstruct a lightweight model while preserving performance through distillation. Second, to mitigate performance loss due to pruning, we incorporate multi-expert conditional convolution (ME-CondConv) into compressed UNets to enhance network performance…
Read More
Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias

Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias

arXiv:2406.00020v1 Announce Type: new Abstract: Content moderation on social media platforms shapes the dynamics of online discourse, influencing whose voices are amplified and whose are suppressed. Recent studies have raised concerns about the fairness of content moderation practices, particularly for aggressively flagging posts from transgender and non-binary individuals as toxic. In this study, we investigate the presence of bias in harmful speech classification of gender-queer dialect online, focusing specifically on the treatment of reclaimed slurs. We introduce a novel dataset, QueerReclaimLex, based on 109 curated templates exemplifying non-derogatory uses of LGBTQ+ slurs. Dataset instances are scored by gender-queer annotators for…
Read More
An Efficient Multi Quantile Regression Network with Ad Hoc Prevention of Quantile Crossing

An Efficient Multi Quantile Regression Network with Ad Hoc Prevention of Quantile Crossing

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model

SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model

arXiv:2406.00195v1 Announce Type: new Abstract: While AI-generated content has garnered significant attention, achieving photo-realistic video synthesis remains a formidable challenge. Despite the promising advances in diffusion models for video generation quality, the complex model architecture and substantial computational demands for both training and inference create a significant gap between these models and real-world applications. This paper presents SNED, a superposition network architecture search method for efficient video diffusion model. Our method employs a supernet training paradigm that targets various model cost and resolution options using a weight-sharing method. Moreover, we propose the supernet training sampling warm-up for fast training optimization.…
Read More
EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records

EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records

arXiv:2406.00019v1 Announce Type: new Abstract: In this paper, we introduce EHR-SeqSQL, a novel sequential text-to-SQL dataset for Electronic Health Record (EHR) databases. EHR-SeqSQL is designed to address critical yet underexplored aspects in text-to-SQL parsing: interactivity, compositionality, and efficiency. To the best of our knowledge, EHR-SeqSQL is not only the largest but also the first medical text-to-SQL dataset benchmark to include sequential and contextual questions. We provide a data split and the new test set designed to assess compositional generalization ability. Our experiments demonstrate the superiority of a multi-turn approach over a single-turn approach in learning compositionality. Additionally, our dataset integrates…
Read More
Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

arXiv:2406.00079v1 Announce Type: new Abstract: Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement in online environments by providing task contexts, such as multiple trajectories, called in-context RL. However, due to the quadratic computation complexity of attention in transformers, current in-context RL methods suffer from huge computational costs as the task horizon increases. In contrast, the Mamba model is renowned for its efficient ability to process long-term dependencies, which provides an opportunity for in-context RL to solve tasks that require…
Read More
Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding

Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding

arXiv:2406.00143v1 Announce Type: new Abstract: Temporal sentence grounding is a challenging task that aims to localize the moment spans relevant to a language description. Although recent DETR-based models have achieved notable progress by leveraging multiple learnable moment queries, they suffer from overlapped and redundant proposals, leading to inaccurate predictions. We attribute this limitation to the lack of task-related guidance for the learnable queries to serve a specific mode. Furthermore, the complex solution space generated by variable and open-vocabulary language descriptions exacerbates the optimization difficulty, making it harder for learnable queries to distinguish each other adaptively. To tackle this limitation, we…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.