Comparison of Methods for Emotion Dimensions Estimation in Speech Using a Three-Layered Model

20-09-2015 07:20

This paper proposes a three-layer model for estimating the expressed emotions in a speech signal based on a dimensional approach. Several estimators are adopted for estimating the three emotion dimensions (valence, activation, and dominance) for a speech signal. These estimators were designed to predict emotion dimensions from acoustic features directly. However, the acoustic features correlates to valence dimension are less numerous, less strong, and the valence dimension has being particularly difficult to be predicted. The ultimate goal of this study is to improve the dimensional approach in order to precisely predict the valence dimension. The proposed model consists of three layers: acoustic features, semantic primitives, emotion dimensions respectively. In this paper, we first compared several popular estimation methods and evaluated their performance by applying them using the traditional two-layered model and the proposed three-layered model. The experimental results show that the proposed three-layered model using fuzzy inference system and KNN as an estimator outperforms the traditional two-layered model using the same estimators.