論 文Papers

CONFERENCE (INTERNATIONAL)

Simultaneous Detection and Localization of a Wake-Up Word using Multi-Task Learning of the Duration and Endpoint

Takashi Maekaku, Yusuke Kida, Akihiko Sugiyama

The 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019), 2019/9

Category:

音声処理 (Speech Processing)

Abstract:
This paper proposes a novel method for simultaneous detection and localization of a wake-up word using multi-task learning of the duration and endpoint. An onset of the wake-up word is estimated by going back in time by an estimated duration of the wake-up word from an estimated endpoint. Accurate endpoint estimation is achieved by training the network to fire only at the endpoint in contrast to the entire wake-up word. The accurate endpoint naturally leads to an accurate onset, when it is used as a basis to calculate an onset with an estimated duration that reflects the whole acoustic information over the entire wake-up word. Experimental results with real-environment data show that a relative improvement in accuracy of 41% for onset estimation and 38% for endpoint estimation are achieved compared to a baseline method.
Download:

Simultaneous Detection and Localization of a Wake-Up Word using Multi-Task Learning of the Duration and Endpoint(外部サイト/External Site Link)