Abstract
Recent advances in computational stylometry have enabled scholars to detect authorial signals with a high degree of precision, but the focus on accuracy comes at the expense of explainability: powerful black-box models are often of little use to traditional humanistic disciplines. With this in mind, we have conducted stylometric experiments on Maospeak, a language style shaped by the writings and speeches of Mao Zedong. We measure per-token perplexity across different GPT models, compute Kullback–Leibler divergences between local and global vocabulary distributions, and train a TF-IDF classifier to examine how the modern Chinese language has been transformed to convey the tenets of Maoist doctrine. We offer a computational interpretation of ideology as reduction in perplexity and increase in systematicity of language use.
Original language | English |
---|---|
Title of host publication | Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages |
Editors | Mika Hämäläinen, Emily Öhman, Flammie PIRINEN, Khalid ALNAJJAR, So MIYAGAWA, Yuri BIZZONI, Niko PARTANEN, Jack RUETER |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 76-81 |
Number of pages | 5 |
ISBN (Print) | 9798891760127 |
Publication status | Published - Dec 2023 |