[Climate Data + MOSAIC]
http://mosaicdatabase.web.ox.ac.uk
Updated 14 March 2022
Vignette #5 - Climate Data and MOSAIC
MOSAIC was built with the intention of its use in comparative biodemography - to strengthen the map of which functional traits predict vital rates and which do not. Traits in the MOSAIC database (factorial and numeric variables) are formatted to be easily accessed and used for regression-based analyses, among other types of analysis. MOSAIC was also designed with the analysis of demographic metrics and traits in their environmental contexts. To this end, MOSAIC includes a separate object (climate
) which we will access and make use of in the following exercise.
library(devtools)
library(tidyverse)
library(leaflet)
Accessing MOSAIC
Download MOSAIC from the mosaic portal. For more information on the basics of downloading MOSAIC and navigating the data structure, see:Â Vignette #1:
remotes::install_github("mosaicdatabase/Rmosaic") library(Rmosaic)
mosaic <- mos_fetch("v1.0.0")
##
## Phylogenetic tree with 1359 tips and 1358 internal nodes.
##
## Tip labels:
## Oxalis_acetosella, Rourea_induta, Euphorbia_telephioides, Euphorbia_fontqueriana, Triadica_sebifera, Actinostemon_concolor, ...
## Node labels:
## Node1, Node2, Node3, Node4, Node5, Node6, ...
##
## Rooted; includes branch lengths.
Selecting Target Matrices
To get this exercise started, we first need to select which part of MOSAIC we are interested in. For simplicity, we are here focusing on mammal biomass as was done in Vignette #2:
# Extracting all biomass data
mosaic_dataframe <- data.frame(mosaic@taxaMetadat,
Biomass = as.numeric(mosaic@biomass@value),
MOSAIC_index = 1:length(mosaic@index))
# Restrict mammal data with non NA biomass data
mosaic_mammal <- subset(mosaic_dataframe,
Class == "Mammalia" & is.na(Biomass) == FALSE)
# Extracting the first COMPADRE matrix id for each entry
matrix_ids_full <- mosaic@index[mosaic_mammal$MOSAIC_index]
# Extract just the first matrix id using sapply
mammal_matrix_ids <- sapply(matrix_ids_full, `[[`, 1)
# Add matrix id to the mammal data
mosaic_mammal$MatrixID <- mammal_matrix_ids
# Add full species names
mosaic_mammal$Binomial <- paste0(mosaic_mammal$Genus, " ", mosaic_mammal$Species)
We now have a data frame containing MOSAIC IDs, taxonomic information, and biomass values for all mammals within MOSAIC. The corresponding matrix IDs are:
mammal_matrix_ids
## [1] 249092 249094 240296 240307 249126 240331 249159 240358 240362 249214
## [11] 249248 240402 249269 249273 249275 249286 240470 249399 249409 249455
## [21] 249504 240502 249524 249526 249597 249598 240504 249671 249674 249723
## [31] 249726 240674 249753 249759 249787 240511 249817 249821 249866 240565
## [41] 249878 240677 249909 240645
Accessing the Climate Data
With our matrix IDs at hand for analysis, we are now ready to tap into the climate data pool contained within MOSAIC. The climate data hereafter, has been obtained using the R-package KrigR and climate data is reported as mean and standard deviation of monthly time-series belonging to the location and study duration for each matrix contained within MOSAIC.
Climate Data in MOSAIC
The climate data within MOSAIC is stored in the climate
 object:
head(mosaic@climate)
## X.1 X MatrixID Lat Lon StudyStart StudyEnd GenTime db
## 1 1 1 240296 61.79944 -150.3694 1976 1986 10.64398229 comadre
## 2 2 2 240297 61.79944 -150.3694 1976 1986 <NA> comadre
## 3 3 3 240298 61.79944 -150.3694 1976 1986 <NA> comadre
## 4 4 4 240299 61.79944 -150.3694 1976 1986 <NA> comadre
## 5 5 5 240300 61.79944 -150.3694 1976 1986 8.16781721 comadre
## 6 6 6 240301 61.79944 -150.3694 1976 1986 12.2931104 comadre
## Download air_temperature air_temperature_SD total_precipitation
## 1 1 273.7485 9.862699 0.06464312
## 2 1 273.7485 9.862699 0.06464312
## 3 1 273.7485 9.862699 0.06464312
## 4 1 273.7485 9.862699 0.06464312
## 5 1 273.7485 9.862699 0.06464312
## 6 1 273.7485 9.862699 0.06464312
## total_precipitation_SD volumetric_soil_water_layer_1
## 1 0.0420547 0.3704328
## 2 0.0420547 0.3704328
## 3 0.0420547 0.3704328
## 4 0.0420547 0.3704328
## 5 0.0420547 0.3704328
## 6 0.0420547 0.3704328
## volumetric_soil_water_layer_1_SD volumetric_soil_water_layer_2
## 1 0.02808088 0.3684833
## 2 0.02808088 0.3684833
## 3 0.02808088 0.3684833
## 4 0.02808088 0.3684833
## 5 0.02808088 0.3684833
## 6 0.02808088 0.3684833
## volumetric_soil_water_layer_2_SD volumetric_soil_water_layer_3
## 1 0.02796726 0.36928
## 2 0.02796726 0.36928
## 3 0.02796726 0.36928
## 4 0.02796726 0.36928
## 5 0.02796726 0.36928
## 6 0.02796726 0.36928
## volumetric_soil_water_layer_3_SD volumetric_soil_water_layer_4
## 1 0.02442513 0.3888525
## 2 0.02442513 0.3888525
## 3 0.02442513 0.3888525
## 4 0.02442513 0.3888525
## 5 0.02442513 0.3888525
## 6 0.02442513 0.3888525
## volumetric_soil_water_layer_4_SD runoff runoff_SD total_evaporation
## 1 0.01339378 0.03377308 0.02755258 -0.02956482
## 2 0.01339378 0.03377308 0.02755258 -0.02956482
## 3 0.01339378 0.03377308 0.02755258 -0.02956482
## 4 0.01339378 0.03377308 0.02755258 -0.02956482
## 5 0.01339378 0.03377308 0.02755258 -0.02956482
## 6 0.01339378 0.03377308 0.02755258 -0.02956482
## total_evaporation_SD
## 1 0.03250795
## 2 0.03250795
## 3 0.03250795
## 4 0.03250795
## 5 0.03250795
## 6 0.03250795
As you can see, there are NA values for some matrix IDs in the above output. This results from either (1) missing geolocations of the underlying matrix models making extraction of environmental parameters impossible, (2) study durations of matrix models pre-dating climate data availability at monthly scale within ERA5-Land, or (3) geolocation of matrix models falling outside of land masses.
Extracting Climate Data
Since we are only interested in a subset of matrix models, we do not require the full climate data object and so we can refer to only the relevant subset of the climate
 object by matching MatrixID
 values:
clim_df <- mosaic@climate[mosaic@climate$MatrixID %in% mammal_matrix_ids, ]
With the subsetted climate data, we can now create a data frame holding all necessary data for some exploratory analyses:
analysis_df <- merge(mosaic_mammal, clim_df, by = "MatrixID")
analysis_df <- na.omit(analysis_df[analysis_df$Biomass < 1e6, ]) # we set a cutoff on biomass here, to avoid influence of outliers. This is just a procedure for simplicity of this exercise
Let’s plot the remaining data out in their geospatial context:
cols <- rainbow(n = length(unique(analysis_df$Order)))
names(cols) <- unique(analysis_df$Order)
analysis_df$Col <- cols[match(analysis_df$Order, names(cols))]
Map <- leaflet(analysis_df, width = "100%")
Map <- addProviderTiles(Map, providers$Esri.WorldTopoMap)
Map <- addCircleMarkers(map = Map, lng = ~Lon, lat = ~Lat,
label = ~Order,
col = ~Col,
labelOptions = labelOptions(textsize = "12px")
)
Map
Leaflet | Tiles © Esri — Esri, DeLorme, NAVTEQ, TomTom, Intermap, iPC, USGS, FAO, NPS, NRCAN, GeoBase, Kadaster NL, Ordnance Survey, Esri Japan, METI, Esri China (Hong Kong), and the GIS User Community
Regression & Analysis
Now we are ready to run analyses. The ones we are focussing on here are rudimentary at best and should simply serve as a demonstration of how easily one can make use of state-of-the-art ERA5-Land climate data with the MOSAIC data base.
All Mammals
First, let’s assess the effect of temperature on biomass of all mammals:
summary(lm(data = analysis_df, Biomass ~ air_temperature))
##
## Call:
## lm(formula = Biomass ~ air_temperature, data = analysis_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -136457 -35516 -12225 17982 218728
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1502659 532349 2.823 0.00991 **
## air_temperature -5006 1850 -2.706 0.01290 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 79270 on 22 degrees of freedom
## Multiple R-squared: 0.2498, Adjusted R-squared: 0.2157
## F-statistic: 7.324 on 1 and 22 DF, p-value: 0.0129
According to this, mammals in colder regions are heavier. This is in support of Bergmann’s rule.
ggplot(data = analysis_df, aes(x = air_temperature, y = Biomass)) +
geom_point() +
stat_smooth(method = "lm") +
labs(x = "Air Temperature [K]", y = "Biomass") +
theme_bw()
Mammals by Order
Now let’s make use of the higher-resolution taxonomic information available in MOSAIC and assess the impact of air temperature on biomass by order of mammals:
summary(lm(data = analysis_df, Biomass ~ air_temperature * Order))
##
## Call:
## lm(formula = Biomass ~ air_temperature * Order, data = analysis_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -122023 -10392 0 12873 143838
##
## Coefficients: (4 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3047458 2548979 1.196 0.255
## air_temperature -10376 9154 -1.133 0.279
## OrderCarnivora -1985731 2765125 -0.718 0.486
## OrderCingulata -14654 148341 -0.099 0.923
## OrderDidelphimorphia -4262 158154 -0.027 0.979
## OrderDiprotodontia -3150686 3435359 -0.917 0.377
## OrderLagomorpha -99518 95702 -1.040 0.319
## OrderPrimates -2008837 3738418 -0.537 0.601
## OrderRodentia -219816 100383 -2.190 0.049 *
## air_temperature:OrderCarnivora 7092 9885 0.717 0.487
## air_temperature:OrderCingulata NA NA NA NA
## air_temperature:OrderDidelphimorphia NA NA NA NA
## air_temperature:OrderDiprotodontia 10742 12153 0.884 0.394
## air_temperature:OrderLagomorpha NA NA NA NA
## air_temperature:OrderPrimates 6911 13007 0.531 0.605
## air_temperature:OrderRodentia NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 75170 on 12 degrees of freedom
## Multiple R-squared: 0.6319, Adjusted R-squared: 0.2945
## F-statistic: 1.873 on 11 and 12 DF, p-value: 0.148
Well, this is just a mess, but should serve to illustrate how many different analyses can easily be performed with MOSAIC and the in-built climate parameters.
ggplot(data = analysis_df, aes(x = air_temperature, y = Biomass, col = Order)) +
geom_point() +
stat_smooth(method = "lm") +
labs(x = "Air Temperature [K]", y = "Biomass") +
theme_bw() + scale_color_manual(values = cols)