You are here

The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species

Posted date: June 23, 2017
Publication Year: 
Authors: Aubry, Keith B.; Raley, Catherine M.; McKelvey, Kevin S.
Publication Series: 
Scientific Journal (JRNL)
Source: PLoS One. 12: e0179152.


The availability of spatially referenced environmental data and species occurrence records in online databases enable practitioners to easily generate species distribution models (SDMs) for a broad array of taxa. Such databases often include occurrence records of unknown reliability, yet little information is available on the influence of data quality on SDMs generated for rare, elusive, and cryptic species that are prone to misidentification in the field. We investigated this question for the fisher (Pekania pennanti), a forest carnivore of conservation concern in the Pacific States that is often confused with the more common Pacific marten (Martes caurina). Fisher occurrence records supported by physical evidence (verifiable records) were available from a limited area, whereas occurrence records of unknown quality (unscreened records) were available from throughout the fisher's historical range. We reserved 20% of the verifiable records to use as a test sample for both models and generated SDMs with each dataset using Maxent. The verifiable model performed substantially better than the unscreened model based on multiple metrics including AUCtest values (0.78 and 0.62, respectively), evaluation of training and test gains, and statistical tests of how well each model predicted test localities. In addition, the verifiable model was consistent with our knowledge of the fisher's habitat relations and potential distribution, whereas the unscreened model indicated a much broader area of high-quality habitat (indices > 0.5) that included large expanses of high-elevation habitat that fishers do not occupy. Because Pacific martens remain relatively common in upper elevation habitats in the Cascade Range and Sierra Nevada, the SDM based on unscreened records likely reflects primarily a conflation of marten and fisher habitat. Consequently, accurate identifications are far more important than the spatial extent of occurrence records for generating reliable SDMs for the fisher in this region. We strongly recommend that practitioners avoid using anecdotal occurrence records to build SDMs but, if such data are used, the validity of resulting models should be tested with verifiable occurrence records.


Aubry, Keith B.; Raley, Catherine M.; McKelvey, Kevin S. 2017. The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species. PLoS One. 12: e0179152.