Abstract
Emotion recognition is a crucial application in human–computer interaction. It is usually conducted using facial expressions as the main modality, which might not be reliable. In this study, we proposed a multimodal approach that uses 2-channel electroencephalography (EEG) signals and eye modality in addition to the face modality to enhance the recognition performance. We also studied the use of facial images versus facial depth as the face modality and adapted the common arousal–valence model of emotions and the convolutional neural network, which can model the spatiotemporal information from the modality data for emotion recognition. Extensive experiments were conducted on the modality and emotion data, the results of which showed that our system has high accuracies of 67.8% and 77.0% in valence recognition and arousal recognition, respectively. The proposed method outperformed most state-of-the-art systems that use similar but fewer modalities. Moreover, the use of facial depth has outperformed the use of facial images. The proposed method of emotion recognition has significant potential for integration into various educational applications.
Original language | English |
---|---|
Pages (from-to) | 107-117 |
Number of pages | 11 |
Journal | Information Fusion |
Volume | 77 |
Early online date | 31 Jul 2021 |
DOIs | |
Publication status | Published - Jan 2022 |
Bibliographical note
Publisher Copyright:© 2021 Elsevier B.V.
Funding
The research described in this study was supported by the Dean’s Research Fund 2019/20 (IDS-9) of the Education University of Hong Kong and the Lam Woo Research Fund (LWI20011) of Lingnan University, Hong Kong. Ethical approval for this study was reviewed by the Human Research Ethics Committee of the Education University of Hong Kong, and ethical approval was granted by the committee.
Keywords
- 3D convolutional neural network
- Arousal–valence model of emotions
- Electroencephalogram
- Emotion recognition