A Framework for Leveraging Partially-Labeled Data for Product Attribute-Value Identification

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled A Framework for Leveraging Partially-Labeled Data for Product Attribute-Value Identification, by D. Subhalingam and 3 other authors

View PDF
HTML (experimental)

Abstract:In the e-commerce domain, the accurate extraction of attribute-value pairs (e.g., Brand: Apple) from product titles and user search queries is crucial for enhancing search and recommendation systems. A major challenge with neural models for this task is the lack of high-quality training data, as the annotations for attribute-value pairs in the available datasets are often incomplete. To address this, we introduce GenToC, a model designed for training directly with partially-labeled data, eliminating the necessity for a fully annotated dataset. GenToC employs a marker-augmented generative model to identify potential attributes, followed by a token classification model that determines the associated values for each attribute. GenToC outperforms existing state-of-the-art models, exhibiting upto 56.3% increase in the number of accurate extractions. Furthermore, we utilize GenToC to regenerate the training dataset to expand attribute-value annotations. This bootstrapping substantially improves the data quality for training other standard NER models, which are typically faster but less capable in handling partially-labeled data, enabling them to achieve comparable performance to GenToC. Our results demonstrate GenToC’s unique ability to learn from a limited set of partially-labeled data and improve the training of more efficient models, advancing the automated extraction of attribute-value pairs. Finally, our model has been successfully integrated into IndiaMART, India’s largest B2B e-commerce platform, achieving a significant increase of 20.2% in the number of correctly identified attribute-value pairs over the existing deployed system while achieving a high precision of 89.5%.

Submission history

From: D. Subhalingam [view email]
[v1]
Fri, 17 May 2024 17:09:45 UTC (656 KB)
[v2]
Mon, 18 Nov 2024 06:50:30 UTC (837 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.