Summarization Based on Embedding Distributions
EMNLP 2015 (the 2015 Conference on Empirical Methods in Natural Language Processing), 2015/9
自然言語処理 (Natural Language Processing) 機械学習 (Machine Learning)
- In this study, we consider a summarization method using the document level similarity based on embeddings, or distributed representations of words, where we assume that an embedding of each word can represent its “meaning.” We formalize our task as the problem of maximizing a submodular function defined by the negative summation of the nearest neighbors’ distances on embedding distributions, each of which represents a set of word embeddings in a document. We proved the submodularity of our objective function and that our problem is asymptotically related to the KL-divergence between the probability density functions that correspond to a document and its summary in a continuous space. An experiment using a real dataset demonstrated that our method performed better than the existing method based on sentence-level similarity.
Summarization Based on Embedding Distributions（PDF）