Scalable Twitter user clustering approach boosted by Personalized PageRank
Anup Naik, Hideyuki Maeda, Vibhor Kanojia, Sumio Fujita
International Journal of Data Science and Analytics, 2017/12
Information Retrieval Data Science
- witter has been the focus of analysis in regard to various interesting and challenging problems, one of them being clustering of its users based on their interests. There are many clustering approaches for graphs that look at either the structure or the contents of the graph. However, when we consider real-world complex data such as Twitter data, structural approaches may produce many different user clusters with similar interests. Moreover, content-based clustering approaches on Twitter data also produce inferior results because tweets have a limited number of characters and lots of garbled data. Hence, for practical applications, these clustering approaches cannot be directly used on Twitter data. In the study reported in this paper, we clustered Twitter users on the basis of their interests, looking at both the structure of the graph generated from Twitter data and the contents of the Tweets. In short, we clustered Twitter users by using an unsupervised structural approach, merging similar clusters using a content-based approach, expanding the graph and ranking users with Personalized PageRank, and determining the topic to which a cluster belongs in accordance with the hashtag frequency. The results of combining these approaches were better than those of the existing techniques and befit practical applications.
Scalable Twitter user clustering approach boosted by Personalized PageRank（External Site Link）