View a PDF of the paper titled MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder, by Khai Le-Duc and 5 other authors
Abstract:Multilingual automatic speech recognition (ASR) in the medical domain serves as a foundational task for various downstream applications such as speech translation, spoken language understanding, and voice-activated assistants. This technology enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we introduce MultiMed, the first multilingual medical ASR dataset, along with the first collection of small-to-large end-to-end medical ASR models, spanning five languages: Vietnamese, English, German, French, and Mandarin Chinese. To our best knowledge, MultiMed stands as the world’s largest medical ASR dataset across all major benchmarks: total duration, number of recording conditions, number of accents, and number of speaking roles. Furthermore, we present the first multilinguality study for medical ASR, which includes reproducible empirical baselines, a monolinguality-multilinguality analysis, Attention Encoder Decoder (AED) vs Hybrid comparative study, a layer-wise ablation study for the AED, and a linguistic analysis for multilingual medical ASR. All code, data, and models are available online: this https URL
Submission history
From: Khai Le-Duc [view email]
[v1]
Sat, 21 Sep 2024 09:05:48 UTC (7,773 KB)
[v2]
Thu, 9 Jan 2025 10:50:12 UTC (7,824 KB)
Source link
lol