A spatiotemporal and motion information extraction network for action recognition

Wei WANG, Xianmin WANG*, Mingliang ZHOU, Xuekai WEI, Jing LI, Xiaojun REN, Xuemei ZONG

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

2 Citations (Scopus)


With the continuous advancement in Internet-of-Things and deep learning, video action recognition is gradually emerging in daily and industrial applications. Spatiotemporal and motion patterns are two crucial and complementary types of information used for action recognition. However, effectively modelling both types of information in videos remains challenging. In this paper, we propose a spatiotemporal and motion information extraction (STME) network that extracts comprehensive spatiotemporal and motion information from videos for action recognition. First, we design the STME network, which includes three efficient modules: a spatiotemporal extraction (STE) module, a short-term motion extraction (SME) module and a long-term motion extraction (LME) module. The SME and LME modules are used to model short-term and long-term motion representation, respectively. Then, we apply the STE module to capture comprehensive spatiotemporal information which can supplement the video representation for action recognition. According to our experimental results, the STME network achieves significantly better performance than existing methods on several benchmark datasets. Our codes are available at https://github.com/STME-Net/STME.

Original languageEnglish
Number of pages17
JournalWireless Networks
Publication statusE-pub ahead of print - 28 Feb 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.


  • Action recognition
  • Deep learning
  • Motion
  • Spatiotemporal information


Dive into the research topics of 'A spatiotemporal and motion information extraction network for action recognition'. Together they form a unique fingerprint.

Cite this