Coupled Hierarchical Dirichlet Process Mixtures for Simultaneous Clustering and Topic Modeling
Masamichi Shimosaka(Tokyo Institute of Technology), Takeshi Tsukiji(The University of Tokyo), Shoji Tominaga(The University of Tokyo) and Kota Tsubouchi
ECML PKDD 2016 (The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery), 2016/9
自然言語処理 (Natural Language Processing) 機械学習 (Machine Learning) データサイエンス (Data Science)
- We propose a nonparametric Bayesian mixture model that simultaneously optimizes the topic extraction and group clustering while allowing all topics to be shared by all clusters for grouped data. In addition, in order to enhance the computational efficiency on par with today’s large-scale data, we formulate our model so that it can use a closed-form variational Bayesian method to approximately calculate the posterior distribution. Experimental results with corpus data show that our model has a better performance than existing models, achieving a 22 % improvement against state-of-the-art model. Moreover, an experiment with location data from mobile phones shows that our model performs well in the field of big data analysis.
Coupled Hierarchical Dirichlet Process Mixtures for Simultaneous Clustering and Topic Modeling（外部サイト／External Site Link）