Capturing long-tailed individual tree diversity using an airborne imaging and a multi-temporal hierarchical model
Measuring forest biodiversity using terrestrial surveys is expensive and can only capture common species abundance in large heterogeneous landscapes. In contrast, combining airborne imagery with computer vision can generate individual tree data at the scales of hundreds of thousands of trees. To train computer vision models, ground-based species labels are combined with airborne reflectance data. Due to the difficulty of finding rare species in a large landscape, many classification models only include the most abundant species, leading to biased predictions at broad scales. For example, if only common species are used to train the model, this assumes that these samples are representative across the entire landscape. Extending classification models to include rare species requires targeted data collection and algorithmic improvements to overcome large data imbalances between dominant and rare taxa. We use a targeted sampling workflow to the Ordway Swisher Biological Station within the US National Ecological Observatory Network (NEON), where traditional forestry plots had identified six canopy tree species with more than 10 individuals at the site. Combining iterative model development with rare species sampling, we extend a training dataset to include 14 species. Using a multi-temporal hierarchical model, we demonstrate the ability to include species predicted at <1% frequency in landscape without losing performance on the dominant species. The final model has over 75% accuracy for 14 species with improved rare species classification compared to 61% accuracy of a baseline deep learning model. After filtering out dead trees, we generate landscape species maps of individual crowns for over 670 000 individual trees. We find distinct patches of forest composed of rarer species at the full-site scale, highlighting the importance of capturing species diversity in training data. We estimate the relative abundance of 14 species within the landscape and provide three measures of uncertainty to generate a range of counts for each species. For example, we estimate that the dominant species, Pinus palustris accounts for c. 28% of predicted stems, with models predicting a range of counts between 160 000 and 210 000 individuals. These maps provide the first estimates of canopy tree diversity within a NEON site to include rare species and provide a blueprint for capturing tree diversity using airborne computer vision at broad scales.