MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder

stp2yJanuary 10, 20250 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 21 Sep 2024 (v1), last revised 9 Jan 2025 (this version, v2)]

View a PDF of the paper titled MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder, by Khai Le-Duc and 5 other authors

View PDF
HTML (experimental)

Abstract:Multilingual automatic speech recognition (ASR) in the medical domain serves as a foundational task for various downstream applications such as speech translation, spoken language understanding, and voice-activated assistants. This technology enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we introduce MultiMed, the first multilingual medical ASR dataset, along with the first collection of small-to-large end-to-end medical ASR models, spanning five languages: Vietnamese, English, German, French, and Mandarin Chinese. To our best knowledge, MultiMed stands as the world’s largest medical ASR dataset across all major benchmarks: total duration, number of recording conditions, number of accents, and number of speaking roles. Furthermore, we present the first multilinguality study for medical ASR, which includes reproducible empirical baselines, a monolinguality-multilinguality analysis, Attention Encoder Decoder (AED) vs Hybrid comparative study, a layer-wise ablation study for the AED, and a linguistic analysis for multilingual medical ASR. All code, data, and models are available online: this https URL

Submission history

From: Khai Le-Duc [view email]
[v1]
Sat, 21 Sep 2024 09:05:48 UTC (7,773 KB)
[v2]
Thu, 9 Jan 2025 10:50:12 UTC (7,824 KB)

Source link
lol

By stp2y