Speaker Selective Beamformer with Keyword Mask Estimation - Yahoo! JAPANの研究開発

Publications

ワークショップ (国際) Speaker Selective Beamformer with Keyword Mask Estimation

Yusuke Kida, Dung Tran, Motoi Omachi, Toru Taniguchi, and Yuya Fujita

2018 IEEE Workshop on Spoken Language Technology (SLT 2018　)

2018.12.18

This paper addresses the problem of automatic speech recog- nition (ASR) of a target speaker in background speech. The novelty of our approach is that we focus on a wakeup key- word, which is usually used for activating ASR systems like smart speakers. The proposed method firstly utilizes a DNN- based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker and the remain- ing background speech. Then the separated signals are used for calculating a beamforming filter to enhance the subse- quent utterances from the target speaker. Experimental evalu- ations show that the trained DNN-based mask can selectively separate the keyword and background speech from the mix- ture signal. The effectiveness of the proposed method is also verified with Japanese ASR experiments, and we confirm that the character error rates are significantly improved by the pro- posed method for both simulated and real recorded test sets.

Paper : Speaker Selective Beamformer with Keyword Mask Estimation （外部サイト）