論 文Papers

CONFERENCE (INTERNATIONAL)

Robust DNN-based VAD augmented with phone entropy based rejection of background speech

Yuya Fujita, Ken-ichi Iso

InterSpeech 2016, 2016/9

Category:

音声処理 (Speech Processing)

Abstract:
We propose a DNN-based voice activity detector augmented by entropy based frame rejection. DNN-based VAD classifies a frame into speech or non-speech and achieves significantly higher VAD performance compared to conventional statistical model-based VAD. We observed that many of the remaining er- rors are false alarms caused by background human speech, such as TV / radio or surrounding peoples’ conversations. In order to reject such background speech frames, we introduce an entropy- based confidence measure using the phone posterior probability output by a DNN-based acoustic model. Compared to the target speaker’s voice background speech tends to have relatively un- clear pronunciation or is contaminated by other types of noises so its entropy becomes larger than audio signals with only the target speaker’s voice. Combining DNN-based VAD and the en- tropy criterion, we reject speech frames classified by the DNN- based VAD as having an entropy larger than a threshold value. We have evaluated the proposed approach and confirmed greater than 10% reduction in Sentence Error Rate.
Download:

Robust DNN-based VAD augmented with phone entropy based rejection of background speech(外部サイト/External Site Link)