Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense

AI Slop Is Flooding Medium


View a PDF of the paper titled Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense, by Samuel Cahyawijaya and Ruochen Zhang and Holy Lovenia and Jan Christian Blaise Cruz and Elisa Gilbert and Hiroki Nomoto and Alham Fikri Aji

View PDF
HTML (experimental)

Abstract:Multilingual large language models (LLMs) have gained prominence, but concerns arise regarding their reliability beyond English. This study addresses the gap in cross-lingual semantic evaluation by introducing a novel benchmark for cross-lingual sense disambiguation, StingrayBench. In this paper, we demonstrate using false friends — words that are orthographically similar but have completely different meanings in two languages — as a possible approach to pinpoint the limitation of cross-lingual sense disambiguation in LLMs. We collect false friends in four language pairs, namely Indonesian-Malay, Indonesian-Tagalog, Chinese-Japanese, and English-German; and challenge LLMs to distinguish the use of them in context. In our analysis of various models, we observe they tend to be biased toward higher-resource languages. We also propose new metrics for quantifying the cross-lingual sense bias and comprehension based on our benchmark. Our work contributes to developing more diverse and inclusive language modeling, promoting fairer access for the wider multilingual community.

Submission history

From: Samuel Cahyawijaya [view email]
[v1]
Mon, 28 Oct 2024 22:09:43 UTC (9,057 KB)
[v2]
Wed, 30 Oct 2024 11:56:17 UTC (9,057 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.