Detecting Emotions in Human Voice

Nineli Lashkarashvili (San Diego State University, Georgia); Lela Mirtskhulava, Lm. (Iv. Javakhishvili Tbilisi State University & San Diego State University Georgia, Georgia)

Expressing emotions is one of the most important factors in human communications. Words are not the only clues for conveying emotional information. Vocal features like timbre, loudness, tone, pitch, and facial expressions play a huge role. If there would be a good tool for recognizing human emotions this would make it possible to acquire machine intelligence with emotional intelligence. In this paper, we conducted extensive analysis of the impact of the number of MFCCs on several models' performances. We created 3 different models for this task: 2D Convolutional Neural Network with 4 convolution blocks and the LSTM and GRU models consisting of 256 neurons. We used the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) for training the models. We compare our models with the previous results and find that our best model for speech dataset has a 19% improvement. Two of our models for the song dataset achieved 78.4% accuracy.

Journal: International Journal of Simulation- Systems, Science and Technology- IJSSST V22

Published: no date/time given

DOI: 10.5013/IJSSST.a.22.01.03