Data Analytics on Online Student Engagement Data for Academic Performance Modeling.

Xiaohui TAO*, Aaron SHANNON-HONSON, Patrick DELANEY, Lin LI, Christopher DANN, Yan LI, Haoran XIE

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

8 Citations (Scopus)


In large MOOC cohorts, the sheer variance and volume of discussion forum posts can make it difficult for instructors to distinguish nuanced emotion in students, such as engagement levels or stress, purely from textual data. Sentiment analysis has been used to build student behavioral models to understand emotion, however, more recent research suggests that separating sentiment and stress into different measures could improve approaches. Detecting stress in a MOOC corpus is challenging as students may use language that does not conform to standard definitions, but new techniques like TensiStrength provide more nuanced measures of stress by considering it as a spectrum. In this work, we introduce an ensemble method that extracts feature categories of engagement, semantics and sentiment from an AdelaideX student dataset. Stacked and voting methods are used to compare performance measures on how accurately these features can predict student grades. The stacked method performed best across all measures, with our Random Forest baseline further demonstrating that negative sentiment and stress had little impact on academic results. As a secondary analysis, we explored whether stress among student posts increased in 2020 compared to 2019 due to COVID-19, but found no significant change. Importantly, our model indicates that there may be a relationship between features, which warrants future research.

Original languageEnglish
Pages (from-to)103176-103186
Number of pages11
JournalIEEE Access
Early online date22 Sept 2022
Publication statusPublished - Sept 2022

Bibliographical note

The authors acknowledge the Human Research Ethics Committee approval from The University of Southern Queensland, reference number H20REA137. The authors would like to thank Ali Ogilvie and the entire Online Programs Team from The University of Adelaide for approval of data usage and ongoing support of this project. Finally, the authors would like to thank Mike Thelwall for providing access to SentiStrength and TensiStrength for the purposes of this research.


  • Academic performance modelling
  • Analytical models
  • Anxiety disorders
  • Computer aided instruction
  • Education
  • Electronic learning
  • Ensemble Method
  • Feature extraction
  • MOOC
  • Natural Language Processing
  • Natural language processing
  • Online services
  • Semantics
  • Stress measurement
  • natural language processing
  • Ensemble method
  • academic performance modeling


Dive into the research topics of 'Data Analytics on Online Student Engagement Data for Academic Performance Modeling.'. Together they form a unique fingerprint.

Cite this