論 文Papers

CONFERENCE (INTERNATIONAL)

Article De-duplication Using Distributed Representations

Shumpei Okura, Yukihiro Tagami and Akira Tajima

WWW 2016 (The 25th International Conference on World Wide Web) Posters <Best Poster Runner-up>, 2016/5

Category:

自然言語処理 (Natural Language Processing) 機械学習 (Machine Learning) データサイエンス (Data Science)

Abstract:
In news recommendation systems, eliminating redundant information is important as well as providing interesting articles for users. We propose a method that quantifies the similarity of articles based on their distributed representation, learned with the category information as weak supervision. This method is useful for evaluation under tight time constraints, since it only requires low-dimensional inner product calculation for estimating similarities. The experimental results from human evaluation and online performance in A/B testing suggest the effectiveness of our proposed method, especially for quantifying middle-level similarities. Currently, this method is used on Yahoo! JAPAN’s front page, which has millions of users per day and billions of page views per month.
Download:

Article De-duplication Using Distributed Representations(PDF 403KB)