A Comparative Live Evaluation of Multileaving Methods on a Commercial cQA Search
Tomohiro Manabe, Akiomi Nishida, Makoto P. Kato (Kyoto University), Takehiro Yamamoto (Kyoto University), Sumio Fujita
SIGIR 2017, 2017/8
情報検索 (Information Retrieval) データサイエンス (Data Science)
- We present one of the world’s rst aempts to examine the feasibility of multileaving evaluation of document rankings on a large scale commercial community estion Answering (cQA) service. As a natural enhancement of interleaving evaluation, multileaving merges more than two input rankings into one, and measures the search user satisfaction of each input ranking based on user clicks on the multileaved ranking. We evaluated the adequateness of two major multileaving methods, team dra multileaving (TDM) and optimized multileaving (OM), proposing their practical implementation for live services. Our experimental results demonstrated that multileaving methods could precisely evaluate the eectiveness of ve rankings with dierent quality by using clicks from real users. Moreover, we concluded that OM is beer than TDM in terms of eciency, with an observation that most of the evaluation results with OM converged aer showing multileaved rankings around 40,000 times, and an in-depth analysis of their characteristics.
A Comparative Live Evaluation of Multileaving Methods on a Commercial cQA Search（外部サイト／External Site Link）