Viral News

Exploring the Potential of Polynomial Basis Functions in Kolmogorov-Arnold Networks: A Comparative Study of Different Groups of Polynomials

Exploring the Potential of Polynomial Basis Functions in Kolmogorov-Arnold Networks: A Comparative Study of Different Groups of Polynomials

arXiv:2406.02583v1 Announce Type: new Abstract: This paper presents a comprehensive survey of 18 distinct polynomials and their potential applications in Kolmogorov-Arnold Network (KAN) models as an alternative to traditional spline-based methods. The polynomials are classified into various groups based on their mathematical properties, such as orthogonal polynomials, hypergeometric polynomials, q-polynomials, Fibonacci-related polynomials, combinatorial polynomials, and number-theoretic polynomials. The study aims to investigate the suitability of these polynomials as basis functions in KAN models for complex tasks like handwritten digit classification on the MNIST dataset. The performance metrics of the KAN models, including overall accuracy, Kappa, and F1 score, are evaluated…
Read More
3D-HGS: 3D Half-Gaussian Splatting

3D-HGS: 3D Half-Gaussian Splatting

arXiv:2406.02720v1 Announce Type: new Abstract: Photo-realistic 3D Reconstruction is a fundamental problem in 3D computer vision. This domain has seen considerable advancements owing to the advent of recent neural rendering techniques. These techniques predominantly aim to focus on learning volumetric representations of 3D scenes and refining these representations via loss functions derived from rendering. Among these, 3D Gaussian Splatting (3D-GS) has emerged as a significant method, surpassing Neural Radiance Fields (NeRFs). 3D-GS uses parameterized 3D Gaussians for modeling both spatial locations and color information, combined with a tile-based fast rendering technique. Despite its superior rendering performance and speed, the use…
Read More
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

arXiv:2406.02721v1 Announce Type: new Abstract: We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation process towards desired behaviors. To enhance efficiency, we introduce Self-Control_{prefix}, a compact module that encapsulates the learned representations from suffix gradients into a Prefix Controller, facilitating inference-time control for various LLM behaviors. Our experiments demonstrate Self-Control's efficacy across multiple domains, including emotional modulation,…
Read More
Spatiotemporal Predictions of Toxic Urban Plumes Using Deep Learning

Spatiotemporal Predictions of Toxic Urban Plumes Using Deep Learning

arXiv:2406.02582v1 Announce Type: new Abstract: Industrial accidents, chemical spills, and structural fires can release large amounts of harmful materials that disperse into urban atmospheres and impact populated areas. Computer models are typically used to predict the transport of toxic plumes by solving fluid dynamical equations. However, these models can be computationally expensive due to the need for many grid cells to simulate turbulent flow and resolve individual buildings and streets. In emergency response situations, alternative methods are needed that can run quickly and adequately capture important spatiotemporal features. Here, we present a novel deep learning model called ST-GasNet that was…
Read More
Window to Wall Ratio Detection using SegFormer

Window to Wall Ratio Detection using SegFormer

arXiv:2406.02706v1 Announce Type: new Abstract: Window to Wall Ratios (WWR) are key to assessing the energy, daylight and ventilation performance of buildings. Studies have shown that window area has a large impact on building performance and simulation. However, data to set up these environmental models and simulations is typically not available. Instead, a standard 40% WWR is typically assumed for all buildings. This paper leverages existing computer vision window detection methods to predict WWR of buildings from external street view images using semantic segmentation, demonstrating the potential for adapting established computer vision technique in architectural applications Source link lol
Read More
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Block Transformer: Global-to-Local Language Modeling for Fast Inference

arXiv:2406.02657v1 Announce Type: new Abstract: This paper presents the Block Transformer architecture which adopts hierarchical global-to-local modeling to autoregressive transformers to mitigate the inference bottlenecks of self-attention. To apply self-attention, the key-value (KV) cache of all previous sequences must be retrieved from memory at every decoding step. Thereby, this KV cache IO becomes a significant bottleneck in batch inference. We notice that these costs stem from applying self-attention on the global context, therefore we isolate the expensive bottlenecks of global modeling to lower layers and apply fast local modeling in upper layers. To mitigate the remaining costs in the lower…
Read More
Constrained or Unconstrained? Neural-Network-Based Equation Discovery from Data

Constrained or Unconstrained? Neural-Network-Based Equation Discovery from Data

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. Source link lol
Read More
Contrastive Language Video Time Pre-training

Contrastive Language Video Time Pre-training

arXiv:2406.02631v1 Announce Type: new Abstract: We introduce LAVITI, a novel approach to learning language, video, and temporal representations in long-form videos via contrastive learning. Different from pre-training on video-text pairs like EgoVLP, LAVITI aims to align language, video, and temporal features by extracting meaningful moments in untrimmed videos. Our model employs a set of learnable moment queries to decode clip-level visual, language, and temporal features. In addition to vision and language alignment, we introduce relative temporal embeddings (TE) to represent timestamps in videos, which enables contrastive learning of time. Significantly different from traditional approaches, the prediction of a particular timestamp…
Read More
Are PPO-ed Language Models Hackable?

Are PPO-ed Language Models Hackable?

arXiv:2406.02577v1 Announce Type: new Abstract: Numerous algorithms have been proposed to $textit{align}$ language models to remove undesirable behaviors. However, the challenges associated with a very large state space and creating a proper reward function often result in various jailbreaks. Our paper aims to examine this effect of reward in the controlled setting of positive sentiment language generation. Instead of online training of a reward model based on human feedback, we employ a statically learned sentiment classifier. We also consider a setting where our model's weights and activations are exposed to an end-user after training. We examine a pretrained GPT-2 through…
Read More
Pretrained Mobility Transformer: A Foundation Model for Human Mobility

Pretrained Mobility Transformer: A Foundation Model for Human Mobility

arXiv:2406.02578v1 Announce Type: new Abstract: Ubiquitous mobile devices are generating vast amounts of location-based service data that reveal how individuals navigate and utilize urban spaces in detail. In this study, we utilize these extensive, unlabeled sequences of user trajectories to develop a foundation model for understanding urban space and human mobility. We introduce the textbf{P}retrained textbf{M}obility textbf{T}ransformer (PMT), which leverages the transformer architecture to process user trajectories in an autoregressive manner, converting geographical areas into tokens and embedding spatial and temporal information within these representations. Experiments conducted in three U.S. metropolitan areas over a two-month period demonstrate PMT's ability to…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.