Emotional Speech Recognition and Synthesis in Multiple Languages toward Affective Speech-to-Speech Translation System

19-09-2015 12:23

Speech-to-speech translation (S2ST) is the process by which a spoken utterance in one language is used to produce a spoken output in another language. The conventional approach to S2ST has focused on processing linguistic information only by directly translating the spoken utterance from the source language to the target language without taking into account paralinguistic and non-linguistic information such as the emotional states at play in the source language. In this work, we explore how to deal with Para-and non-linguistic information among multiple languages, with a particular focus on speakers_ emotional states, in S2ST scenarios called _affective S2ST._ In our efforts to construct an effective system, we discuss (1) how to describe emotions in speech and how to model the perception/production of emotions and (2) the commonality and differences among multiple languages in the proposed model. We then use these discussions as context for (3) an examination of our _affective S2ST_ system in operation.