Transductive Classification on Heterogeneous Information Networks with Edge Betweenness-based Normalization

Phiradet Bangcharoensap(Tokyo Tech), Tsuyoshi Murata(Tokyo Tech), Hayato Kobayashi and Nobuyuki Shimizu

WSDM 2016, 2016/2


Data Science

This paper proposes a novel method for transductive classi- fication on heterogeneous information networks composed of multiple types of vertices. Such networks naturally represent many real-world Web data such as DBLP data (author, paper, and conference). Given a network where some vertices are labeled, the classifier aims to predict labels for the remaining vertices by propagating the labels to the entire network. In the label propagation process, many studies reduce the importance of edges connecting to a high-degree vertex. The assumption is unsatisfactory when reliability of a label of a vertex cannot be implied from its degree. On the basis of our intuition that edges bridging across communities are less trustworthy, we adapt edge betweenness to imply the importance of edges. Since directly applying the conventional edge betweenness is inefficient on heterogeneous networks, we propose two additional refinements. First, the centrality utilizes the fact that networks contain multiple types of vertices. Second, the centrality ignores flows originating from endpoints of considering edges. The experimental results on real-world datasets show our proposed method is more effective than a state-of-the-art method, GNetMine. On average, our method yields 92.79 ± 1.25% accuracy on a DBLP network even if only 1.92% of vertices are labeled. Our simple weighting scheme results in more than 5 percentage points increase in accuracy compared with GNetMine.

Transductive Classification on Heterogeneous Information Networks with Edge Betweenness-based Normalization(External Site Link)