Abstract
This study explores the use of artificial intelligence (AI) as a complementary tool for grading essay-type questions in higher education, focusing on its consistency with human grading and potential to reduce biases. Using 70 handwritten exams from an introductory sociology course, we evaluated generative pretrained transformer (GPT) models’ performance in transcribing and scoring students’ responses. GPT models were tested under various settings for both transcription and grading tasks. Results show high similarity between human and GPT transcriptions, with GPT-4o-mini outperforming GPT-4 in accuracy. For grading, GPT demonstrated strong correlations with the human grader scores, especially when template answers were provided. However, discrepancies remained, highlighting GPT’s role as a “second grader” to flag inconsistencies for assessment reviewing rather than fully replacing human evaluation. This study contributes to the growing literature on AI in education, demonstrating its potential to enhance fairness and efficiency in grading essay-type questions.
| Original language | English |
|---|---|
| Number of pages | 19 |
| Journal | Teaching Sociology |
| Early online date | 19 Dec 2025 |
| DOIs | |
| Publication status | E-pub ahead of print - 19 Dec 2025 |
Bibliographical note
Publisher Copyright:© American Sociological Association 2025
Keywords
- AI-assisted grading
- bias reduction
- essay-type assessment
- generative pretrained transformers
- higher education