Abstract
The performance of existing sign language recognition approaches is typically limited by the scale of training data. To address this issue, we propose a mutual enhancement network (MEN) for joint sign language recognition and education. First, a sign language recognition system built upon a spatial-temporal network is proposed to recognize the semantic category of a given sign language video. Besides, a sign language education system is developed to detect the failure modes of learners and further guide them to sign correctly. Our theoretical contribution lies in formulating the above two systems as an estimation-maximization (EM) framework, which can progressively boost each other. The recognition system could become more robust and accurate with more training data collected by the education system, while the education system could guide the learners to sign more precisely, benefiting from the hand shape analysis module of the recognition system. Experimental results on three large-scale sign language recognition datasets validate the superiority of the proposed framework.
Original language | English |
---|---|
Article number | 3174031 |
Pages (from-to) | 311-325 |
Number of pages | 15 |
Journal | IEEE Transactions on Neural Networks and Learning Systems |
Volume | 35 |
Issue number | 1 |
Early online date | 25 May 2022 |
DOIs | |
Publication status | Published - Jan 2024 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2012 IEEE.
Funding
This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61906108, in part by The University of Hong Kong (HKU) Startup Fund, and in part by the HKU Seed Fund for Basic Research and SmartMore Donation Fund.
Keywords
- Terms-Estimation-maximization (EM), mutualenhancement, sign language education, sign language recognition