Viral News

LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS

LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS

[Submitted on 20 Aug 2024 (v1), last revised 21 Aug 2024 (this version, v2)] View a PDF of the paper titled LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS, by Xinyu Liu and 4 other authors View PDF HTML (experimental) Abstract:Video Object Segmentation (VOS) presents several challenges, including object occlusion and fragmentation, the dis-appearance and re-appearance of objects, and tracking specific objects within crowded scenes. In this work, we combine the strengths of the state-of-the-art (SOTA) models SAM2 and Cutie to address these challenges. Additionally, we explore the impact of various hyperparameters on video instance segmentation performance. Our…
Read More
StructuredRAG: JSON Response Formatting with Large Language Models

StructuredRAG: JSON Response Formatting with Large Language Models

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
Top 6 Benefits Of Using Microsoft .NET For Ecommerce Websites – Blogs’s Blog

Top 6 Benefits Of Using Microsoft .NET For Ecommerce Websites – Blogs’s Blog

    Thanks to the global epidemic,e-commerce has surged to the top of the global commerce ladder.  Accordingto data, consumers turned to eCommerce by almost 30%during the epidemic, which accelerated the sector’s growth. This evident domination showsthat eCommerce companies need to adopt an effective strategy to remaincompetitive. And the platform you use tocreate your eCommerce business is where it all begins.  You want to invest in a platformthat increases your operational efficiency. Moreover, it provides customerswith exceptional experiences and opens up amazing growth prospects.    Let us introduce you to Microsoft.NET!  Companies have widely used .NETto create scalable, reliable, and…
Read More
BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction

BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction

arXiv:2408.10285v1 Announce Type: new Abstract: Retrosynthesis analysis is pivotal yet challenging in drug discovery and organic chemistry. Despite the proliferation of computational tools over the past decade, AI-based systems often fall short in generalizing across diverse reaction types and exploring alternative synthetic pathways. This paper presents BatGPT-Chem, a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction. Integrating chemical tasks via a unified framework of natural language and SMILES notation, this approach synthesizes extensive instructional data from an expansive chemical database. Employing both autoregressive and bidirectional training techniques across over one hundred million instances, BatGPT-Chem captures a…
Read More
Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation

Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation

arXiv:2408.10453v1 Announce Type: new Abstract: Text-to-video generation has been dominated by end-to-end diffusion-based or autoregressive models. On one hand, those novel models provide plausible versatility, but they are criticized for physical correctness, shading and illumination, camera motion, and temporal consistency. On the other hand, film industry relies on manually-edited Computer-Generated Imagery (CGI) using 3D modeling software. Human-directed 3D synthetic videos and animations address the aforementioned shortcomings, but it is extremely tedious and requires tight collaboration between movie makers and 3D rendering experts. In this paper, we introduce an automatic synthetic video generation pipeline based on Vision Large Language Model (VLM)…
Read More
An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs

An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference

AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference

arXiv:2408.10284v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models are designed to enhance the efficiency of large language models (LLMs) without proportionally increasing the computational demands. However, their deployment on edge devices still faces significant challenges due to high on-demand loading overheads from managing sparsely activated experts. This paper introduces AdapMoE, an algorithm-system co-design framework for efficient MoE inference. AdapMoE features adaptive expert gating and management to reduce the on-demand loading overheads. We observe the heterogeneity of experts loading across layers and tokens, based on which we propose a sensitivity-based strategy to adjust the number of activated experts dynamically. Meanwhile, we…
Read More
The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks

The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks

arXiv:2408.10446v1 Announce Type: new Abstract: The rapid advancement of text-to-image generation systems, exemplified by models like Stable Diffusion, Midjourney, Imagen, and DALL-E, has heightened concerns about their potential misuse. In response, companies like Meta and Google have intensified their efforts to implement watermarking techniques on AI-generated images to curb the circulation of potentially misleading visuals. However, in this paper, we argue that current image watermarking methods are fragile and susceptible to being circumvented through visual paraphrase attacks. The proposed visual paraphraser operates in two steps. First, it generates a caption for the given image using KOSMOS-2, one of the latest…
Read More
Putting People in LLMs’ Shoes: Generating Better Answers via Question Rewriter

Putting People in LLMs’ Shoes: Generating Better Answers via Question Rewriter

arXiv:2408.10573v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated significant capabilities, particularly in the domain of question answering (QA). However, their effectiveness in QA is often undermined by the vagueness of user questions. To address this issue, we introduce single-round instance-level prompt optimization, referred to as question rewriter. By enhancing the intelligibility of human questions for black-box LLMs, our question rewriter improves the quality of generated answers. The rewriter is optimized using direct preference optimization based on feedback collected from automatic criteria for evaluating generated answers; therefore, its training does not require costly human annotations. The experiments across…
Read More
NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models

NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models

arXiv:2408.10280v1 Announce Type: new Abstract: In this paper, we introduce Nested Low-Rank Adaptation (NoRA), a novel approach to parameter-efficient fine-tuning that extends the capabilities of Low-Rank Adaptation (LoRA) techniques. Vanilla LoRA overlooks pre-trained weight inheritance and still requires fine-tuning numerous parameters. To addresses these issues, our NoRA adopts a dual-layer nested structure with Singular Value Decomposition (SVD), effectively leveraging original matrix knowledge while reducing tunable parameters. Specifically, NoRA freezes the outer LoRA weights and utilizes an inner LoRA design, providing enhanced control over model optimization. This approach allows the model to more precisely adapt to specific tasks while maintaining a…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.