Predicting stock splits using ensemble machine learning and SMOTE oversampling

Ang LI*, Mark LIU*, Simon SHEATHER

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

5 Citations (Scopus)


This study predicts stock splits using two ensemble machine learning techniques: gradient boosting machines (GBMs) and random forests (RFs). The goal is to form implementable portfolios based on positive predictions to generate abnormal returns. Since splits are rare events, we use SMOTE oversampling to synthesize new observations of splits in the sample to improve predictions. When predicting stock splits in the next quarter, GBM and RF achieve area under the receiver operating characteristic curve (AUC) scores of around 0.86 and 0.87, respectively. GBM and RF predictions generate monthly five-factor alphas (Fama and French, 2015) of 0.26% and 0.95% among stocks in the smallest size quintile. Three important features for predicting stock splits in both ensemble ML methods are current price levels, the ratio of current price to the price at last split, and stock returns in the past twelve months. When predicting stock splits in the next year, GBMs generate monthly five-factor alphas of 0.38% among small stocks.
Original languageEnglish
Article number101948
Number of pages24
JournalPacific-Basin Finance Journal
Early online date28 Jan 2023
Publication statusPublished - Apr 2023

Bibliographical note

Funding Information:
☆ We thank Leonce Bargeron, Alex Brefeld, Charlie Clarke, Grant Clayton, Koustav De, Will Gerken, Russell Jame, Paulo Manoel, Peter Qi, Fabrice Riva (discussant), Jake Smith (discussant), Spencer Stone, Mao Ye, seminar participants at the University of Kentucky, and session participants at the 2022 Commonwealth Computational Summit, 12th Financial Markets and Corporate Governance Conference, and the 2022 FMA Annual Meeting for helpful comments. All errors and omissions are our own.

Publisher Copyright:
© 2023 Elsevier B.V.


  • Stock splits
  • Ensemble machine learning
  • Gradient boosting machines
  • Random forests
  • SHAP feature importance
  • Hyperparameter tuning
  • SMOTE oversampling


Dive into the research topics of 'Predicting stock splits using ensemble machine learning and SMOTE oversampling'. Together they form a unique fingerprint.

Cite this