Hybrid Approach to Parallel Stochastic Gradient Descent

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


[Submitted on 27 Jun 2024]

View a PDF of the paper titled Hybrid Approach to Parallel Stochastic Gradient Descent, by Aakash Sudhirbhai Vora and Dhrumil Chetankumar Joshi and Aksh Kantibhai Patel

View PDF
HTML (experimental)

Abstract:Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel. Synchronous and asynchronous approach to data parallelism is used by most systems to train the model in parallel. However, both of them have their drawbacks. We propose a third approach to data parallelism which is a hybrid between synchronous and asynchronous approaches, using both approaches to train the neural network. When the threshold function is selected appropriately to gradually shift all parameter aggregation from asynchronous to synchronous, we show that in a given time period our hybrid approach outperforms both asynchronous and synchronous approaches.

Submission history

From: Dhrumil Chetankumar Joshi [view email]
[v1]
Thu, 27 Jun 2024 06:28:30 UTC (3,754 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.