
No species left behind: Researchers "borrow" data between species to improve modeling results

Reliable estimates of species distributions are an essential tool for conservation management – knowing where species live helps us make better decisions about their conservation. But despite the exponential growth of publicly available biodiversity data in recent years, the majority of species on Earth remain understudied and poorly observed. As a result, major gaps and biases persist in our understanding of life on Earth.
More than 80% of vascular plant species and nearly 90% of dragonflies, ants, butterflies, and bees are data-deficient, meaning they have less than thirty observation records available on GBIF, a global repository of public species records often used for species distribution modeling. Even among terrestrial vertebrates, some of the most well-studied groups on Earth, half of all known species are data-deficient. Under this 30-record minimum, accurately capturing the complete distribution of a species with a statistical model becomes prohibitively difficult.
Closing these data gaps is a resource-intensive task that often requires significant fieldwork in inaccessible and remote areas. A pressing question on many researchers’ minds, therefore, is how we might improve distribution estimates with the data we already have.
In a new study published recently in Trends in Ecology and Evolution, BGC Center researchers Shubhi Sharma, Kevin Winner, and coauthors highlight the potential of ‘borrowing strength’ as a strategy for achieving just this. Rather than following the standard approach of modeling one species at a time, the borrowing strength approach links information across multiple related or ecologically similar species so that data-rich species can help inform estimates for data-deficient ones.
“State-of-the-art species distribution models today are becoming increasingly data hungry, but there are very few that focus on data-deficient species. Most species on earth are data-deficient, so when we model ‘biodiversity’, it turns out that we're modeling only a very small portion of the total biodiversity,” Sharma said. “This is a problem because the outputs of species distribution models are used for all kinds of purposes, including conservation planning and spatial prioritization exercises. At the moment, data-deficient species are left out of these important analyses, potentially biasing our conservation decisions.”

Borrowing strength techniques shift from the traditional species distribution modeling approach of modeling one species at a time to modeling multiple species simultaneously while accounting for their correlations, the researchers say. These correlations might be based on species interactions, such as predator-prey relationships; species traits, such as trophic level or body size; or shared evolutionary history. The key mechanism underpinning the borrowing strength approach is that these shared traits and evolutionary history between species can reveal similar environmental niches, or the set of environmental and biological conditions necessary for a species to survive, and the environmental niche informs species’ spatial distributions.
The current approaches for borrowing strength in species distribution modeling have been successfully used to improve distribution estimates for data-rich species, but their application to data-deficient species has not yet been well explored. In their new paper, Sharma, Winner, and coauthors make the case for leveraging these existing modeling approaches in combination with data on species interrelatedness within a borrowing strength framework to improve distribution models for data-deficient species.
“While we don't have data on every species, we have a lot of data on some species,” Sharma said. "We know that species are related to each other, for example, by their evolutionary history or by similar traits. By leveraging these facts to borrow strength from data-rich species to data-poor species, we can improve the distribution estimates as well as habitat characterizations of data-poor species.”
The application of shared evolutionary history for borrowing strength approaches rests on the demonstrated observation that species that share more recent common ancestors are more likely to have overlapping niches. So, the distribution of a data-deficient species can often be reliably modeled by borrowing strength from a closely related, data-rich species. As the researchers point out, the availability and completeness of databases of species traits and phylogenies have tremendously advanced in recent years, offering a large potential dataset for undertaking borrowing strength approaches.
“Borrowing strength approaches provide a useful framework for structuring how and when to mix these different descriptors of species’ distributions in non-prescriptive ways that allow them to be integrated with a variety of different statistical models and inference approaches,” Winner said.
“While this approach has been discussed in the SDM literature, it has never been concretely tested or demonstrated,” Sharma said. “Part of this review is a call to action –- to test and develop multi-species models of this purpose.”
Sharma is already making headway in these investigations – during her PhD, she has published a method for quantifying similarity between species niches and will continue to demonstrate the efficacy of this method within borrowing strength modeling approaches.
“While nothing can replace fieldwork, we know that collecting data can be a resource-intensive process,” Sharma said. “We believe that there is a suite of models that can be extended specifically for improving data-deficient species. The bedrock of these models exist, and to fully realize the potential, we need to rigorously test them in data-deficient contexts.”



