JCSE, vol. 13, no. 4, pp.131-140, 2019
DOI: http://dx.doi.org/10.5626/JCSE.2019.13.4.131
A Study on the Emotional Feature Composed of the Melfrequency Cepstral Coefficient and the Speech Speed
Insuk Hong, Youjung Ko, Yoonjoong Kim, and Hyunsoon Shin
Department of Computer Engineering, Hanbat National University, Daejeon, Korea
Brain-Emotion Research Section, Electronics and Telecommunications Research Institute, Daejeon, Korea
Abstract: Through experiment, this research introduces and verifies the usefulness of prosody attributes such as loudness, pitch and
sound length as an emotional feature to express the characteristics of the emotion being felt. Sound length is proportional
to pronunciation duration and is inversely proportional to the number of phonemic changes per unit of time. Based on
these facts, speech speed and the emotional feature were calculated as follows. First, a codebook was generated using
mel-frequency cepstral frequency (MFCC) vectors from a preexisting emotional speech database. Second, the MFCC
vector of the speech signal was vector-quantized by this codebook to generate a quantized sequence. Third, this sequence
was considered for a phoneme sequence and the speech speed was computed by normalizing the number of phoneme
changes for each window of time. Fourth, the emotional feature was generated based on this speech speed as follows.
The speech speed was added to the MFCC vector with delta and acceleration computation to generate an emotional feature
that is related to prosody elements such loudness, pitch and sound length. In order to analyze the utility of these
emotional features, a recognition system was developed with the emotional features and a hidden Markov model
(HMM). For maximum performance, the degree of MFCC, size of the codebook, method of speech speed computation,
window size of speech speed computation, number of HMM model state and the number of the Gaussian mixture model
(GMM) per state were carefully selected. To test the recognition system a text-independent, speaker-independent experiment
and a text-independent, speaker-dependent experiment were conducted. It was verified that the recognition system
using emotional features showed better performance than the recognition system using only speech features, with
improvements of 2.5% and 3.5%, respectively, in different experiments.
Keyword:
Emotion recognition; Speech speed; Emotional feature; MFCC; HMM
Full Paper: 272 Downloads, 1680 View
|