3-step Parallel Corpus Cleaning using Monolingual Crowd
PACLING 2015 (The 2015 Conference of the Pacific Association for Computational Linguistics), 2015/5
自然言語処理 (Natural Language Processing) クラウドソーシング (Crowdsourcing)
- A high-quality parallel corpus needs to be manually created to achieve good machine translation for the domains which do not have enough existing resources. Although the quality of the corpus to some extent can be improved by asking the professional translators to translate, it is impossible to completely avoid making any mistakes. In this paper, we propose a framework for cleaning the existing professionally-translated parallel corpus in a quick and cheap way. The proposed method uses a 3-step crowdsourcing procedure to efficiently detect and edit the translation flaws, and also guarantees the reliability of the edits. The experiments using the fashion-domain e-commerce-site (EC-site) parallel corpus show the effectiveness of the proposed method for the parallel corpus cleaning.
3-step Parallel Corpus Cleaning using Monolingual Crowd（外部サイト／External Site Link）