EPAComp: An Architectural Model for EPA Composition
Luís Henrique Neves Villaça, Sean Wolfgand Matsui Siqueira, et al.
SBSI 2023
We describe American English concatenative text-to-speech synthesis experiments in which "expressions," here, questioning and contrastive emphasis, are each associated with a ToBI prosodic template. ToBI labels, along with text features, are in turn incorporated into decision-tree models of F0 and segment duration to be used during synthesis, sparing the need for expression-specific large corpora and decision trees. Synthesizing using this approach enables listeners to perform the difficult task of distinguishing yes-no questions from identically-worded declarative sentences 78% of the time, compared to the baseline system's 50%. For contrastive emphasis, a sentence is synthesized with emphasis on a word which is chosen appropriately or inappropriately based on a preceding sentence. Listeners' mean opinion scores for appropriate emphases exceed inappropriate by 0.40 on a 1-to-5 scale for the experimental system, compared to a difference of 0.11 for the baseline, a significant system difference (p < 0.01).
Luís Henrique Neves Villaça, Sean Wolfgand Matsui Siqueira, et al.
SBSI 2023
M. Abe, M. Hori
SAINT 2003
Xiaodan Song, Ching-Yung Lin, et al.
CVPRW 2004
Kun Wang, Juwei Shi, et al.
PACT 2011