Emotion recognition is a crucial application in the human–computer interaction. It is usually performed by using facial expressions as the main modality, which may not be reliable. In this study, we proposed a multimodal approach that uses 2-channel electroencephalography (EEG) signals and the eye modality in addition to the face modality to enhance the recognition performance. We also studied the use of facial images versus facial depth as the face modality and adapt the common arousal-valence model of emotions and the convolutional neural network, which can model the spatiotemporal information from the modality data for emotion recognition. Extensive experiments have been conducted on the modality and emotion data, the results of which showed that our system has high accuracies 67.8% and 77.0% in the valence recognition and arousal recognition, respectively. The proposed method outperformed most of the state-of-the-art systems that use similar but fewer modalities. Moreover, the use of facial depth outperformed the use of facial images. The proposed method of emotion recognition has great potential to be integrated into various educational applications.
Bibliographical noteFunding Information:
The research described in this study was supported by the Dean’s Research Fund 2019/20 ( IDS-9 ) of the Education University of Hong Kong and the Lam Woo Research Fund ( LWI20011 ) of Lingnan University, Hong Kong . Ethical approval for this study was reviewed by the Human Research Ethics Committee of the Education University of Hong Kong, and ethical approval was granted by the committee.
© 2021 Elsevier B.V.
- 3D convolutional neural network
- Arousal–valence model of emotions
- Emotion recognition