Ruri: Japanese General Text Embeddings

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


[Submitted on 12 Sep 2024]

View a PDF of the paper titled Ruri: Japanese General Text Embeddings, by Hayato Tsukagoshi and 1 other authors

View PDF
HTML (experimental)

Abstract:We report the development of Ruri, a series of Japanese general text embedding models. While the development of general-purpose text embedding models in English and multilingual contexts has been active in recent years, model development in Japanese remains insufficient. The primary reasons for this are the lack of datasets and the absence of necessary expertise. In this report, we provide a detailed account of the development process of Ruri. Specifically, we discuss the training of embedding models using synthesized datasets generated by LLMs, the construction of the reranker for dataset filtering and knowledge distillation, and the performance evaluation of the resulting general-purpose text embedding models.

Submission history

From: Hayato Tsukagoshi [view email]
[v1]
Thu, 12 Sep 2024 04:06:31 UTC (7,670 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.