Updated 14 March 2022

Vignette #5 - Climate Data and MOSAIC

MOSAIC was built with the intention of its use in comparative biodemography - to strengthen the map of which functional traits predict vital rates and which do not. Traits in the MOSAIC database (factorial and numeric variables) are formatted to be easily accessed and used for regression-based analyses, among other types of analysis. MOSAIC was also designed with the analysis of demographic metrics and traits in their environmental contexts. To this end, MOSAIC includes a separate object (climate) which we will access and make use of in the following exercise.

library(devtools)
library(tidyverse)
library(leaflet)

Accessing MOSAIC

Download MOSAIC from the mosaic portal. For more information on the basics of downloading MOSAIC and navigating the data structure, see: Vignette #1:

remotes::install_github("mosaicdatabase/Rmosaic")
library(Rmosaic)
mosaic <- mos_fetch("v1.0.0")

## 
## Phylogenetic tree with 1359 tips and 1358 internal nodes.
## 
## Tip labels:
##   Oxalis_acetosella, Rourea_induta, Euphorbia_telephioides, Euphorbia_fontqueriana, Triadica_sebifera, Actinostemon_concolor, ...
## Node labels:
##   Node1, Node2, Node3, Node4, Node5, Node6, ...
## 
## Rooted; includes branch lengths.

Selecting Target Matrices

To get this exercise started, we first need to select which part of MOSAIC we are interested in. For simplicity, we are here focusing on mammal biomass as was done in Vignette #2:

# Extracting all biomass data
mosaic_dataframe <- data.frame(mosaic@taxaMetadat, 
                               Biomass = as.numeric(mosaic@biomass@value),
                               MOSAIC_index = 1:length(mosaic@index))

# Restrict mammal data with non NA biomass data
mosaic_mammal <- subset(mosaic_dataframe,
                        Class == "Mammalia" & is.na(Biomass) == FALSE)

# Extracting the first COMPADRE matrix id for each entry
matrix_ids_full <- mosaic@index[mosaic_mammal$MOSAIC_index]

# Extract just the first matrix id using sapply
mammal_matrix_ids <- sapply(matrix_ids_full, `[[`, 1)

# Add matrix id to the mammal data
mosaic_mammal$MatrixID <- mammal_matrix_ids

# Add full species names
mosaic_mammal$Binomial <- paste0(mosaic_mammal$Genus, " ", mosaic_mammal$Species)

We now have a data frame containing MOSAIC IDs, taxonomic information, and biomass values for all mammals within MOSAIC. The corresponding matrix IDs are:

mammal_matrix_ids

##  [1] 249092 249094 240296 240307 249126 240331 249159 240358 240362 249214
## [11] 249248 240402 249269 249273 249275 249286 240470 249399 249409 249455
## [21] 249504 240502 249524 249526 249597 249598 240504 249671 249674 249723
## [31] 249726 240674 249753 249759 249787 240511 249817 249821 249866 240565
## [41] 249878 240677 249909 240645

Accessing the Climate Data

With our matrix IDs at hand for analysis, we are now ready to tap into the climate data pool contained within MOSAIC. The climate data hereafter, has been obtained using the R-package KrigR and climate data is reported as mean and standard deviation of monthly time-series belonging to the location and study duration for each matrix contained within MOSAIC.

Climate Data in MOSAIC

The climate data within MOSAIC is stored in the climate object:

head(mosaic@climate)

##   X.1 X MatrixID      Lat       Lon StudyStart StudyEnd     GenTime      db
## 1   1 1   240296 61.79944 -150.3694       1976     1986 10.64398229 comadre
## 2   2 2   240297 61.79944 -150.3694       1976     1986        <NA> comadre
## 3   3 3   240298 61.79944 -150.3694       1976     1986        <NA> comadre
## 4   4 4   240299 61.79944 -150.3694       1976     1986        <NA> comadre
## 5   5 5   240300 61.79944 -150.3694       1976     1986  8.16781721 comadre
## 6   6 6   240301 61.79944 -150.3694       1976     1986  12.2931104 comadre
##   Download air_temperature air_temperature_SD total_precipitation
## 1        1        273.7485           9.862699          0.06464312
## 2        1        273.7485           9.862699          0.06464312
## 3        1        273.7485           9.862699          0.06464312
## 4        1        273.7485           9.862699          0.06464312
## 5        1        273.7485           9.862699          0.06464312
## 6        1        273.7485           9.862699          0.06464312
##   total_precipitation_SD volumetric_soil_water_layer_1
## 1              0.0420547                     0.3704328
## 2              0.0420547                     0.3704328
## 3              0.0420547                     0.3704328
## 4              0.0420547                     0.3704328
## 5              0.0420547                     0.3704328
## 6              0.0420547                     0.3704328
##   volumetric_soil_water_layer_1_SD volumetric_soil_water_layer_2
## 1                       0.02808088                     0.3684833
## 2                       0.02808088                     0.3684833
## 3                       0.02808088                     0.3684833
## 4                       0.02808088                     0.3684833
## 5                       0.02808088                     0.3684833
## 6                       0.02808088                     0.3684833
##   volumetric_soil_water_layer_2_SD volumetric_soil_water_layer_3
## 1                       0.02796726                       0.36928
## 2                       0.02796726                       0.36928
## 3                       0.02796726                       0.36928
## 4                       0.02796726                       0.36928
## 5                       0.02796726                       0.36928
## 6                       0.02796726                       0.36928
##   volumetric_soil_water_layer_3_SD volumetric_soil_water_layer_4
## 1                       0.02442513                     0.3888525
## 2                       0.02442513                     0.3888525
## 3                       0.02442513                     0.3888525
## 4                       0.02442513                     0.3888525
## 5                       0.02442513                     0.3888525
## 6                       0.02442513                     0.3888525
##   volumetric_soil_water_layer_4_SD     runoff  runoff_SD total_evaporation
## 1                       0.01339378 0.03377308 0.02755258       -0.02956482
## 2                       0.01339378 0.03377308 0.02755258       -0.02956482
## 3                       0.01339378 0.03377308 0.02755258       -0.02956482
## 4                       0.01339378 0.03377308 0.02755258       -0.02956482
## 5                       0.01339378 0.03377308 0.02755258       -0.02956482
## 6                       0.01339378 0.03377308 0.02755258       -0.02956482
##   total_evaporation_SD
## 1           0.03250795
## 2           0.03250795
## 3           0.03250795
## 4           0.03250795
## 5           0.03250795
## 6           0.03250795

As you can see, there are NA values for some matrix IDs in the above output. This results from either (1) missing geolocations of the underlying matrix models making extraction of environmental parameters impossible, (2) study durations of matrix models pre-dating climate data availability at monthly scale within ERA5-Land, or (3) geolocation of matrix models falling outside of land masses.

Extracting Climate Data

Since we are only interested in a subset of matrix models, we do not require the full climate data object and so we can refer to only the relevant subset of the climate object by matching MatrixID values:

clim_df <- mosaic@climate[mosaic@climate$MatrixID %in% mammal_matrix_ids, ]

With the subsetted climate data, we can now create a data frame holding all necessary data for some exploratory analyses:

analysis_df <- merge(mosaic_mammal, clim_df, by = "MatrixID")
analysis_df <- na.omit(analysis_df[analysis_df$Biomass < 1e6, ]) # we set a cutoff on biomass here, to avoid influence of outliers. This is just a procedure for simplicity of this exercise

Let’s plot the remaining data out in their geospatial context:

cols <- rainbow(n = length(unique(analysis_df$Order)))
names(cols) <- unique(analysis_df$Order)
analysis_df$Col <- cols[match(analysis_df$Order, names(cols))]

Map <- leaflet(analysis_df, width = "100%")
Map <- addProviderTiles(Map, providers$Esri.WorldTopoMap)
Map <- addCircleMarkers(map = Map, lng = ~Lon, lat = ~Lat, 
                        label = ~Order,
                        col = ~Col,
                        labelOptions = labelOptions(textsize = "12px")
)
Map

+−

Leaflet | Tiles © Esri — Esri, DeLorme, NAVTEQ, TomTom, Intermap, iPC, USGS, FAO, NPS, NRCAN, GeoBase, Kadaster NL, Ordnance Survey, Esri Japan, METI, Esri China (Hong Kong), and the GIS User Community

Regression & Analysis

Now we are ready to run analyses. The ones we are focussing on here are rudimentary at best and should simply serve as a demonstration of how easily one can make use of state-of-the-art ERA5-Land climate data with the MOSAIC data base.

All Mammals

First, let’s assess the effect of temperature on biomass of all mammals:

summary(lm(data = analysis_df, Biomass ~ air_temperature))

## 
## Call:
## lm(formula = Biomass ~ air_temperature, data = analysis_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -136457  -35516  -12225   17982  218728 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)   
## (Intercept)      1502659     532349   2.823  0.00991 **
## air_temperature    -5006       1850  -2.706  0.01290 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 79270 on 22 degrees of freedom
## Multiple R-squared:  0.2498, Adjusted R-squared:  0.2157 
## F-statistic: 7.324 on 1 and 22 DF,  p-value: 0.0129

According to this, mammals in colder regions are heavier. This is in support of Bergmann’s rule.

ggplot(data = analysis_df, aes(x = air_temperature, y = Biomass)) + 
  geom_point() +  
  stat_smooth(method = "lm") + 
  labs(x = "Air Temperature [K]", y = "Biomass") + 
  theme_bw()

Mammals by Order

Now let’s make use of the higher-resolution taxonomic information available in MOSAIC and assess the impact of air temperature on biomass by order of mammals:

summary(lm(data = analysis_df, Biomass ~ air_temperature * Order))

## 
## Call:
## lm(formula = Biomass ~ air_temperature * Order, data = analysis_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -122023  -10392       0   12873  143838 
## 
## Coefficients: (4 not defined because of singularities)
##                                      Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                           3047458    2548979   1.196    0.255  
## air_temperature                        -10376       9154  -1.133    0.279  
## OrderCarnivora                       -1985731    2765125  -0.718    0.486  
## OrderCingulata                         -14654     148341  -0.099    0.923  
## OrderDidelphimorphia                    -4262     158154  -0.027    0.979  
## OrderDiprotodontia                   -3150686    3435359  -0.917    0.377  
## OrderLagomorpha                        -99518      95702  -1.040    0.319  
## OrderPrimates                        -2008837    3738418  -0.537    0.601  
## OrderRodentia                         -219816     100383  -2.190    0.049 *
## air_temperature:OrderCarnivora           7092       9885   0.717    0.487  
## air_temperature:OrderCingulata             NA         NA      NA       NA  
## air_temperature:OrderDidelphimorphia       NA         NA      NA       NA  
## air_temperature:OrderDiprotodontia      10742      12153   0.884    0.394  
## air_temperature:OrderLagomorpha            NA         NA      NA       NA  
## air_temperature:OrderPrimates            6911      13007   0.531    0.605  
## air_temperature:OrderRodentia              NA         NA      NA       NA  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 75170 on 12 degrees of freedom
## Multiple R-squared:  0.6319, Adjusted R-squared:  0.2945 
## F-statistic: 1.873 on 11 and 12 DF,  p-value: 0.148

Well, this is just a mess, but should serve to illustrate how many different analyses can easily be performed with MOSAIC and the in-built climate parameters.

ggplot(data = analysis_df, aes(x = air_temperature, y = Biomass, col = Order)) + 
  geom_point() +  
  stat_smooth(method = "lm") + 
  labs(x = "Air Temperature [K]", y = "Biomass") + 
  theme_bw() + scale_color_manual(values = cols)

[Climate Data + MOSAIC]