Multi-modal Retrieval Augmented Generation for Product Query

Main Article Content

Quang-Vinh Dang

Abstract

  Product Query, when an e-commerce website needs to return a product for a user’s query, is essential for any e-commerce system. Traditional search systems only consider the user query in the text and then try to match the search queries with the products’ descriptions in the database. Some recent image search systems utilize deep learning methods to match the image query with the products’ images in the database. However, none combine text and images in a single query. This kind of search is common in modern daily life as a user can take a photo easily with their smartphone and provide a short text description then try to search for a product. In this paper we consider a multi-modal retrieval augmented generation (RAG) to provide a product query system that allows users to search simultaneously by image and text. Our system will provide a better experience and improve the performance of e-commerce websites.

Article Details

Section
Articles