Toward affective speech-to-speech translation: Strategy for emotional speech recognition and synthesis in multiple languages

19-09-2015 12:16

Speech-to-speech translation (S2ST) is the process by which a spoken utterance in one language is used to produce a spoken output in another language. The conventional approach to S2ST has focused on processing linguistic information only by directly translating the spoken utterance from the source language to the target language without taking into account par-alinguistic and non-linguistic information such as the emotional states at play in the source language. This paper introduces activities of JAIST AIS lab1 that explore how to deal with para- and non-linguistic information among multiple languages, with a particular focus on speakers_ emotional states, in S2ST applications called _affective S2ST_. In our efforts to construct an effective system, we discuss (1) how to describe emotions in speech and how to model the perception/production of emotions and (2) the commonality and differences among multiple languages in the proposed model. We then use these discussions as context for (3) an examination of our _affective S2ST_ system in operation.