Viral News

SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model

SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model

arXiv:2411.13802v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated the potential to address some issues within the semiconductor industry. However, they are often general-purpose models that lack the specialized knowledge needed to tackle the unique challenges of this sector, such as the intricate physics and chemistry of semiconductor devices and processes. SemiKong, the first industry-specific LLM for the semiconductor domain, provides a foundation that can be used to develop tailored proprietary models. With SemiKong 1.0, we aim to develop a foundational model capable of understanding etching problems at an expert level. Our key contributions include (a) curating a…
Read More
On Generalization Bounds for Neural Networks with Low Rank Layers

On Generalization Bounds for Neural Networks with Low Rank Layers

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

[Submitted on 1 Aug 2024 (v1), last revised 21 Nov 2024 (this version, v2)] View a PDF of the paper titled Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model, by Benlin Liu and 8 other authors View PDF HTML (experimental) Abstract:Multimodal language models (MLLMs) are increasingly being applied in real-world environments, necessitating their ability to interpret 3D spaces and comprehend temporal dynamics. Current methods often rely on specialized architectural designs or task-specific fine-tuning to achieve this. We introduce Coarse Correspondences, a simple lightweight method that enhances MLLMs' spatial-temporal reasoning with 2D images as input, without modifying the architecture or…
Read More
Hymba: A Hybrid-head Architecture for Small Language Models

Hymba: A Hybrid-head Architecture for Small Language Models

arXiv:2411.13676v1 Announce Type: new Abstract: We propose Hymba, a family of small language models featuring a hybrid-head parallel architecture that integrates transformer attention mechanisms with state space models (SSMs) for enhanced efficiency. Attention heads provide high-resolution recall, while SSM heads enable efficient context summarization. Additionally, we introduce learnable meta tokens that are prepended to prompts, storing critical information and alleviating the "forced-to-attend" burden associated with attention mechanisms. This model is further optimized by incorporating cross-layer key-value (KV) sharing and partial sliding window attention, resulting in a compact cache size. During development, we conducted a controlled study comparing various architectures under…
Read More
In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies

In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies

[Submitted on 2 May 2024 (v1), last revised 20 Nov 2024 (this version, v2)] View a PDF of the paper titled In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies, by Yunbum Kook and 2 other authors View PDF HTML (experimental) Abstract:We present a new random walk for uniformly sampling high-dimensional convex bodies. It achieves state-of-the-art runtime complexity with stronger guarantees on the output than previously known, namely in Rényi divergence (which implies TV, $mathcal{W}_2$, KL, $chi^2$). The proof departs from known approaches for polytime algorithms for the problem -- we utilize a stochastic diffusion perspective to show contraction to the target…
Read More
Deep Learning Approach for Enhancing Oral Squamous Cell Carcinoma with LIME Explainable AI Technique

Deep Learning Approach for Enhancing Oral Squamous Cell Carcinoma with LIME Explainable AI Technique

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

[Submitted on 3 Oct 2024 (v1), last revised 21 Nov 2024 (this version, v2)] Authors:Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong Li, Tong Xie, Xiaoshui Huang, Shufei Zhang, Marco Pavone, Yuqiang Li, Wanli Ouyang, Dongzhan Zhou View a PDF of the paper titled LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning, by Di Zhang and 11 other authors View PDF HTML (experimental) Abstract:This paper presents an advanced mathematical problem-solving framework, LLaMA-Berry, for enhancing the mathematical reasoning ability of Large Language Models (LLMs). The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path…
Read More
Predicting Wall Thickness Changes in Cold Forging Processes: An Integrated FEM and Neural Network approach

Predicting Wall Thickness Changes in Cold Forging Processes: An Integrated FEM and Neural Network approach

[Submitted on 20 Nov 2024 (v1), last revised 21 Nov 2024 (this version, v2)] View a PDF of the paper titled Predicting Wall Thickness Changes in Cold Forging Processes: An Integrated FEM and Neural Network approach, by Sasa Ilic and 5 other authors View PDF HTML (experimental) Abstract:This study presents a novel approach for predicting wall thickness changes in tubes during the nosing process. Specifically, we first provide a thorough analysis of nosing processes and the influencing parameters. We further set-up a Finite Element Method (FEM) simulation to better analyse the effects of varying process parameters. As however traditional FEM…
Read More
Self-Supervised Adversarial Diffusion Models for Fast MRI Reconstruction

Self-Supervised Adversarial Diffusion Models for Fast MRI Reconstruction

[Submitted on 21 Jun 2024 (v1), last revised 20 Nov 2024 (this version, v2)] View a PDF of the paper titled Self-Supervised Adversarial Diffusion Models for Fast MRI Reconstruction, by Mojtaba Safari and 4 other authors View PDF HTML (experimental) Abstract:Purpose: To propose a self-supervised deep learning-based compressed sensing MRI (DL-based CS-MRI) method named "Adaptive Self-Supervised Consistency Guided Diffusion Model (ASSCGD)" to accelerate data acquisition without requiring fully sampled datasets. Materials and Methods: We used the fastMRI multi-coil brain axial T2-weighted (T2-w) dataset from 1,376 cases and single-coil brain quantitative magnetization prepared 2 rapid acquisition gradient echoes (MP2RAGE) T1 maps…
Read More
A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction

A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction

[Submitted on 13 Oct 2023 (v1), last revised 21 Nov 2024 (this version, v2)] View a PDF of the paper titled A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction, by Carel van Niekerk and 6 other authors View PDF HTML (experimental) Abstract:Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present CAMEL (Confidence-based Acquisition Model for Efficient self-supervised active Learning), a pool-based active learning…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.