Deep Multimodal Fusion Convolutional Neural Network for Emotion Recognition
Main Article Content
Abstract
Emotion recognition plays an effective and efficient role in identifying a person’s feelings. The performances of using either one feature provide no accurate recognition, in case the format is vague. This research develops a new model, a deep convolutional neural network with trial-and-error-based fusion (TE-DCNN) for emotion recognition. The proposed TE-DCNN model extracts the audio, visual, and text formats to enhance the emotion recognition process. In this approach, three DCNN models are trained using either format, which consequently reduces the time dependencies and recognition is much faster than the other methods. The model adopts a trial-and-error-based (TE) fusion method to fuse three data formats, which is highly feasible to avoid over-fitting problems. Here, the TE-DCNN model outperformed with better results and also minimized the computational complexity. Moreover, the model is quite flexible and scalable to recognize the emotions of humans. The performance of the TE-DCNN model can be evaluated by five metrics such as accuracy, specificity, precision, recall and F1 score, and achieved 94.33%, 94.58%, 93.80%, 94.08, and 93.94% for emotion recognition compared to other state-of-the-art methods.