Skip to Main Content
Modelling post-fire tree mortality: Can random forest improve discrimination of imbalanced data?Author(s): Timothy M. Shearman; J. Morgan Varner; Sharon M. Hood; C. Alina Canslera; J. Kevin. Hiers
Source: Ecological Modelling. 414: 108855.
Publication Series: Scientific Journal (JRNL)
Station: Rocky Mountain Research Station
Download Publication (1.0 MB)
DescriptionPredicting post-fire tree mortality is a major area of research in fire-prone forests, woodlands, and savannas worldwide. Past research has relied overwhelmingly on logistic regression analysis (LR) that predicts post-fire tree status as a binary outcome (i.e. living or dead). One of the most problematic issues for LR (or any classification problem) occurs when there is a class imbalance in the training data. In these instances, predictions will be biased toward the majority class. Using a historical prescribed fire data set of longleaf pines (Pinus palustris) from northern Florida, USA, we compare results from standard LR and the machine-learning algorithm, random forest (RF). First, we demonstrate the class imbalance problem using simulated data. We then show how a balanced RF model can be used to alleviate the bias in the model and improve mortality prediction results. In the simulated example, LR model sensitivity and specificity was clearly biased based on the degree of imbalance between the classes. The balanced RF models had consistent sensitivity and specificity throughout the simulated data sets. Re-analyzing the original longleaf pine data set with a balanced RF model showed that although both LR and RF models had similar areas under the receiver operating curve (AUC), the RF model had better discrimination for predicting new observations of dead trees. Both LR and RF models identified duff consumption and percent crown scorch as important predictors of tree mortality, however the RF model also suggested prefire duff depth as an important predictor. Our analysis highlights LR limitations when data are imbalanced and supports using RF to develop post-fire tree mortality models. We suggest how RF can be incorporated into future tree mortality studies, as well as possible implementation in future decision-support tools.
- You may send email to firstname.lastname@example.org to request a hard copy of this publication.
- (Please specify exactly which publication you are requesting and your mailing address.)
- We recommend that you also print this page and attach it to the printout of the article, to retain the full citation information.
- This article was written and prepared by U.S. Government employees on official time, and is therefore in the public domain.
CitationShearman, Timothy M.; Varner, J. Morgan; Hood, Sharon M.; Canslera, C. Alina; Hiers, J. Kevin. 2019. Modelling post-fire tree mortality: Can random forest improve discrimination of imbalanced data? Ecological Modelling. 414: 108855.
Keywordsfire effects, logistic regression, machine learning, model evaluation, model validation, Pinus palustris, prescribed fire
- Recent advances in understanding duff consumption and post-fire longleaf pine mortality
- Individual tree diameter, height, and volume functions for longleaf pine
- Modeling survival, yield, volume partitioning and their response to thinning for longleaf pine plantations
XML: View XML