Unsupervised Answer Retrieval with Data Fusion for Community Question Answering
Sosuke Kato (Waseda univ.), Toru Shimizu, Sumio Fujita, Tetsuya Sakai (Waseda univ.)
The 15th Asia Information Retrieval Societies Conference (AIRS 2019), 2019/11
自然言語処理 (Natural Language Processing) 情報検索 (Information Retrieval) 機械学習 (Machine Learning)
- Community question answering (cQA) systems have enjoyed the benefits of advances in neural information retrieval, some models of which need annotated documents as supervised data. However, in contrast with the amount of supervised data for cQA systems, user- generated data in cQA sites have been increasing greatly with time. Thus, focusing on unsupervised models, we tackle a task of retrieving relevant answers for new questions from existing cQA data and propose two frameworks to exploit a Question Retrieval (QR) model for Answer Retrieval (AR). The first framework ranks answers according to the combined scores of QR and AR models and the second framework ranks answers using the scores of a QR model and best answer flags. In our experiments, we applied the combination of our proposed frameworks and a classical fusion technique to AR models with a Japanese cQA data set containing approximately 9.4M question-answer pairs. When best answer flags in the cQA data cannot be utilized, our combination of AR and QR scores with data fusion outperforms a base AR model on average. When best answer flags can be utilized, the retrieval performance can be improved further. While our results lack statistical significance, we discuss effect sizes as well as future sample sizes to attain sufficient statistical power.