Advising roles of a computer consultant
Jean McKendree, John M. Carroll
CHI 1986
As the target of Automatic Speech Recognition (ASR) has moved from clean read speech to spontaneous conversational speech, we need to prepare orthographic transcripts of spontaneous conversational speech to train acoustic models (AMs). However, it is expensive and slow to manually transcribe such speech word by word. We propose a framework to train an AM based on easy-to-make rough transcripts in which fillers and small word fragments are not precisely transcribed and some transcription errors are included. By focusing on the phone duration in the result of forced alignment between the rough transcripts and the utterances, we can automatically detect the erroneous parts in the rough transcripts. A preliminary experiment showed that we can detect the erroneous parts with moderately high recall and precision. Through ASR experiments with conversational telephone speech, we confirmed that automatic detection helped improve the performance of the AM trained with both conventional ML criteria and state-of-the-art boosted MMI criteria. Copyright © 2011 ISCA.
Jean McKendree, John M. Carroll
CHI 1986
Om D. Deshmukh, Shajith Ikbal, et al.
INTERSPEECH 2011
Christine Robson, Sean Kandel, et al.
CHI 2011
Vikram Gupta, Jitendra Ajmera, et al.
INTERSPEECH 2011