Basic Regression using MOSAIC
http://mosaicdatabase.web.ox.ac.uk
Updated 22 February 2022
Vignette #3 - Basic Regression Using MOSAIC & COMPADRE
MOSAIC was built with the intention of its use in comparative biodemography - to strengthen the map of which functional traits predict vital rates and which do not. Traits in the MOSAIC database (factorial and numeric variables) are formatted to be easily accessed and used for regression-based analyses, among other types of analysis. MOSAIC was also designed for quick integration with the COMADRE, COMPADRE, and PADRINO structural population model databases, as outlined in the model exercise below.
library(devtools)
library(tidyverse)
library(Rcompadre) # package to access matrix population models
library(Rage) # package to perform life-history calculations
Accessing MOSAIC
Download MOSAIC from the mosaic portal. For more information on the basics of downloading MOSAIC and navigating the data structure, see:Â Vignette #1:
remotes::install_github("mosaicdatabase/Rmosaic") library(Rmosaic)
mosaic <- mos_fetch("v1.0.0")
Regressing Biomass on Generation Time
In this exercise, the simple bivariate relationship of biomass on generation time is explored through basic regression. Bare in mind, that the below offer the building blocks for more complex multiple regression exercises and other forms of analysis – such as geospatial interpolation and ordination-based practices (see Vignette #4 for a brief vignette of PVA using mosaic).
#1: Extract biomass data for mammals
Extract the biomass values for mammals in the MOSAIC dataset.
# Extracting all biomass data
mosaic_dataframe <- data.frame(mosaic@taxaMetadat,
Biomass = as.numeric(mosaic@biomass@value),
MOSAIC_index = 1:length(mosaic@index))
# Restrict mammal data with non NA biomass data
mosaic_mammal <- subset(mosaic_dataframe,
Class == "Mammalia" & is.na(Biomass) == F)
# Extracting the first COMPADRE matrix id for each entry
matrix_ids_full <- mosaic@index[mosaic_mammal$MOSAIC_index]
# Extract just the first matrix id using sapply
mammal_matrix_ids <- sapply(matrix_ids_full, `[[`, 1)
# Add matrix id to the mammal data
mosaic_mammal$MatrixID <- mammal_matrix_ids
# Add full species names
mosaic_mammal$Binomial <- paste0(mosaic_mammal$Genus, " ", mosaic_mammal$Species)
#2: Download COMADRE matrix population models
A cdb_fetch() function is used to clone the Comadre database <www.compadre-db.org> - which is an s3 copy of a SQL-based relational database - into an S4 data object locally manipulable in R.
Using ids associated with each matrix population model, we will subset the Comadre database of matrix population models for the animals. First we use the Rcompadre
 package to ‘fetch’ the Comadre database.
# Download the most recent database
comadre <- cdb_fetch("comadre")
# Rcompadre function flags potential issues with matrices, including NAs and ergodicity
comadre_flag <- cdb_flag(comadre)
# Remove matrices with NAs and those that are non-ergodic (which throw errors in generation time)
comadre_correct <- subset(comadre_flag,
check_NA_A == FALSE &
check_ergodic == TRUE)
The subsets of mosaic and Compadre can be overlapped in one line of code.
# Subset to mammal matrices
comadre_biomass <- subset(comadre_correct,
MatrixID %in% mammal_matrix_ids)
3. Calculating generation time
Rcompadre
 enables sub-setting of a matrix population model (life table) into constituent matrices representing the survival and fecundity/reproductive components (U and F matrices, respectively). Functions in the Rage
 package often require the specification of both the U and the F matrix, not the combined A matrix. The Rage
 package contains a suite of common demographic calculations - including GENERATION TIME - which are used to simplify this exercise. Note that other common demographic derivations, including transient indices can be extracted using the identical procedure.
# Calculate Generation Time
comadre_biomass$Generation_Time <- mapply(Rage::gen_time,
matU(comadre_biomass),
matF(comadre_biomass))
4. Explore the relationship of generation time data and biomass data
Matrix IDs can be used to bridge generation time from comaadre with mosaic records.
# Extract matrix ID and generation time columns (excluding other matrix metadata)
comadre_biomass <- as.data.frame(comadre_biomass)
comadre_biomass <- comadre_biomass[,c("MatrixID", "Generation_Time")]
# Join the datasets, using Matrix IDs to crosswalk the data
mosaic_biomass <- merge(mosaic_mammal, comadre_biomass,
by = "MatrixID", all.x = TRUE)
# Remove infinity values or NA
mosaic_biomass <- mosaic_biomass[!(is.infinite(mosaic_biomass$Generation_Time) |
is.na(mosaic_biomass$Generation_Time)),]
# Log biomass and generation time
mosaic_biomass$log10_biomass <- log10(mosaic_biomass$Biomass)
mosaic_biomass$log10_gent <- log10(mosaic_biomass$Generation_Time)
# The plot
plot(mosaic_biomass$log10_biomass, mosaic_biomass$log10_gent,
pch=16, cex=1.5, col=alpha("black", 0.5),
xlab=expression(paste(log[10], "Biomass")),
ylab=expression(paste(log[10], "Generation time"))
)
# Simple linear regression (note that we are skipping model fitting in this brief vignette)
lmBioGenT = lm(log10_gent~log10_biomass, data = mosaic_biomass)
summary(lmBioGenT)
##
## Call:
## lm(formula = log10_gent ~ log10_biomass, data = mosaic_biomass)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.53325 -0.16107 -0.01597 0.19936 0.38382
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.85966 0.25372 3.388 0.00253 **
## log10_biomass 0.02762 0.05728 0.482 0.63418
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2623 on 23 degrees of freedom
## Multiple R-squared: 0.01001, Adjusted R-squared: -0.03303
## F-statistic: 0.2326 on 1 and 23 DF, p-value: 0.6342
abline(lmBioGenT, col="red", lty=2, lwd=2)