Pitch accent is a component of prosody that is often used to convey in- formation beyond the intrinsic linguistic meaning of a spoken utterance, such as highlighting words that correspond to important information, or in signaling a contrast with information that was previously conveyed. This information-bearing aspect of pitch accent is therefore important for effec- tive communication in spoken applications.
Recent work has looked into statistical modeling techniques for automatic pitch accent prediction as a component of speech technologies like Text-to- Speech (TTS). Many of these systems, however, have largely overlooked the dimension of discourse context in driving pitch accent placement; others simply introduced more complex models of discourse-level phenomena into the accent prediction component.
We investigate a model for discourse-driven statistical pitch accent predic- tion that makes use of a dynamically-updated semantic space as a means of introducing context-sensitive features into the prediction model. This approach has the advantage of being trainable on a large corpus of unan- notated data, making it less prone to corpus domain bias (i.e. distributions estimated from a given corpus that reflect the genre of that corpus only) inherent in purely probabilistic variables for accent prediction. Moreover, this approach does not require additional modeling of complex discourse processes, but relies solely on shallow analysis of the input text.