Equidistribution-based training of Free Knot Splines and ReLU Neural Networks

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled Equidistribution-based training of Free Knot Splines and ReLU Neural Networks, by Simone Appella and 4 other authors

View PDF
HTML (experimental)

Abstract:We consider the problem of univariate nonlinear function approximation using shallow neural networks (NN) with a rectified linear unit (ReLU) activation function. We show that the $L_2$ based approximation problem is ill-conditioned and the behaviour of optimisation algorithms used in training these networks degrades rapidly as the width of the network increases. This can lead to significantly poorer approximation in practice than expected from the theoretical expressivity of the ReLU architecture and traditional methods such as univariate Free Knot Splines (FKS). Univariate shallow ReLU NNs and FKS span the same function space, and thus have the same theoretical expressivity. However, the FKS representation remains well-conditioned as the number of knots increases. We leverage the theory of optimal piecewise linear interpolants to improve the training procedure for ReLU NNs. Using the equidistribution principle, we propose a two-level procedure for training the FKS by first solving the nonlinear problem of finding the optimal knot locations of the interpolating FKS, and then determine the optimal weights and knots of the FKS by solving a nearly linear, well-conditioned problem. The training of the FKS gives insights into how we can train a ReLU NN effectively, with an equally accurate approximation. We combine the training of the ReLU NN with an equidistribution-based loss to find the breakpoints of the ReLU functions. This is then combined with preconditioning the ReLU NN approximation to find the scalings of the ReLU functions. This fast, well-conditioned and reliable method finds an accurate shallow ReLU NN approximation to a univariate target function. We test this method on a series of regular, singular, and rapidly varying target functions and obtain good results, realising the expressivity of the shallow ReLU network in all cases. We then extend our results to deeper networks.

Submission history

From: Lisa Maria Kreusser [view email]
[v1]
Tue, 2 Jul 2024 10:51:36 UTC (7,049 KB)
[v2]
Thu, 23 Jan 2025 14:15:31 UTC (3,952 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.