論 文Papers

JOURNAL (INTERNATIONAL)

SPEAKER SELECTIVE BEAMFORMER WITH KEYWORD MASK ESTIMATION

Yusuke Kida, Dung Tran, Motoi Omachi, Toru Taniguchi, Yuya Fujita

arXiv.org, 2018/10

Category:

音声処理 (Speech Processing) 機械学習 (Machine Learning)

Abstract:
This paper addresses the problem of automatic speech recog- nition (ASR) of a target speaker in background speech. The novelty of our approach is that we focus on a wakeup key- word, which is usually used for activating ASR systems like smart speakers. The proposed method firstly utilizes a DNN- based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker and the remain- ing background speech. Then the separated signals are used for calculating a beamforming filter to enhance the subse- quent utterances from the target speaker. Experimental evalu- ations show that the trained DNN-based mask can selectively separate the keyword and background speech from the mix- ture signal. The effectiveness of the proposed method is also verified with Japanese ASR experiments, and we confirm that the character error rates are significantly improved by the pro- posed method for both simulated and real recorded test sets.
Download:

SPEAKER SELECTIVE BEAMFORMER WITH KEYWORD MASK ESTIMATION(外部サイト/External Site Link)