Skip to Main Content
Using genetic algorithms to optimize k-Nearest Neighbors configurations for use with airborne laser scanning dataAuthor(s): Ronald E. McRoberts; Grant M. Domke; Qi Chen; Erik Næsset; Terje Gobakken
Source: Remote Sensing of Environment
Publication Series: Scientific Journal (JRNL)
Station: Northern Research Station
View PDF (710.0 KB)
DescriptionThe relatively small sampling intensities used by national forest inventories are often insufficient to produce the desired precision for estimates of population parameters unless the estimation process is augmented with auxiliary information, usually in the form of remotely sensed data. The k-Nearest Neighbors (k-NN) technique is a non-parametric,multivariate approach to prediction that has emerged as particularly popular for use with forest inventory and remotely sensed data and has been shown to contribute substantially to increasing precision. k-NN predictions are calculated as linear combinations of observations for sample units that are nearest in a space of auxiliary variables to the population unit forwhich a prediction is desired. Implementation of a nearest neighbors algorithmrequires four choices: (i) a distancemetric, (ii) specific auxiliary variables to be usedwith the distance metric, (iii) the number of nearest neighbors, and a (iv) scheme for weighting the nearest neighbors. Regardless of the choices for a distance metric and weighting scheme, emerging evidence suggests that optimization of the technique, including selection of an optimal subset of auxiliary variables, greatly enhances prediction. However, optimization can be computationally intensive and time-consuming. A promising approach that is gaining favor is based on genetic algorithms, a technique that uses search heuristics that mimic natural selection to solve optimization problems. The objective of the study was to compare optimized k-NN configurations with respect to inferences for mean volume per unit area using airborne laser scanning variables as auxiliary information. For two study areas, one in Norway and one in Minnesota, USA, the analyses focused on optimizing k-NN configurations that used the weighted Euclidean and canonical correlation distance metrics and two neighborweighting schemes. Novel features of the study include introduction of a neighborweighting scheme that has not previously been used for forestry applications, simultaneous optimization of all four k-NN choices, and basing comparisons on confidence intervals, rather than intermediate products such as prediction accuracies. Two conclusionswere primary: (1) optimized selection of feature variables produced greater precision than using all feature variables, and (2) computational intensity necessary to optimize the weighted Euclidean metric was considerably greater than for the canonical correlation analysis metric. Specific findings were that optimization produced pseudo-R2 as large as 0.87 for the Norwegian dataset and as large as 0.89 for the Minnesota dataset. For the optimized canonical correlation distance metric, widths of approximate 95% confidence intervals as proportions of the estimated means were as small as 0.13 for the Norwegian dataset and as small as 0.15 for the Minnesota dataset.
- Check the Northern Research Station web site to request a printed copy of this publication.
- Our on-line publications are scanned and captured using Adobe Acrobat.
- During the capture process some typographical errors may occur.
- Please contact Sharon Hobrla, email@example.com if you notice any errors which make this publication unusable.
- We recommend that you also print this page and attach it to the printout of the article, to retain the full citation information.
- This article was written and prepared by U.S. Government employees on official time, and is therefore in the public domain.
CitationMcRoberts, Ronald E.; Domke, Grant M.; Chen, Qi; Næsset, Erik; Gobakken, Terje. 2016. Using genetic algorithms to optimize k-Nearest Neighbors configurations for use with airborne laser scanning data. Remote Sensing of Environment. 184: 387-395. https://doi.org/10.1016/j.rse.2016.07.007.
KeywordsInference, Spatial estimation, National forest inventory
- Optimizing nearest neighbour configurations for airborne laser scanning-assisted estimation of forest volume and biomass
- Optimizing the k-Nearest Neighbors technique for estimating forest aboveground biomass using airborne laser scanning data
- Predicting categorical forest variables using an improved k-Nearest Neighbour estimator and Landsat imagery
XML: View XML