Abstract
Recent advances in computational stylometry have enabled scholars to detect authorial signals with a high degree of precision, but the focus on accuracy comes at the expense of explainability: powerful black-box models are often of little use to traditional humanistic disciplines. With this in mind, we have conducted stylometric experiments on Maospeak, a language style shaped by the writings and speeches of Mao Zedong. We measure per-token perplexity across different GPT models, compute Kullback–Leibler divergences between local and global vocabulary distributions, and train a TF-IDF classifier to examine how the modern Chinese language has been transformed to convey the tenets of Maoist doctrine. We offer a computational interpretation of ideology as reduction in perplexity and increase in systematicity of language use.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages |
| Editors | Mika Hämäläinen, Emily Öhman, Flammie PIRINEN, Khalid ALNAJJAR, So MIYAGAWA, Yuri BIZZONI, Niko PARTANEN, Jack RUETER |
| Publisher | Association for Computational Linguistics |
| Pages | 76-81 |
| Number of pages | 5 |
| ISBN (Print) | 9798891760127 |
| Publication status | Published - Dec 2023 |