Basic Regression using MOSAIC

http://mosaicdatabase.web.ox.ac.uk

Updated 22 February 2022

Vignette #3 - Basic Regression Using MOSAIC & COMPADRE

MOSAIC was built with the intention of its use in comparative biodemography - to strengthen the map of which functional traits predict vital rates and which do not. Traits in the MOSAIC database (factorial and numeric variables) are formatted to be easily accessed and used for regression-based analyses, among other types of analysis. MOSAIC was also designed for quick integration with the COMADRE, COMPADRE, and PADRINO structural population model databases, as outlined in the model exercise below.

library(devtools)
library(tidyverse)
library(Rcompadre) # package to access matrix population models
library(Rage) # package to perform life-history calculations

Accessing MOSAIC

Download MOSAIC from the mosaic portal. For more information on the basics of downloading MOSAIC and navigating the data structure, see: Vignette #1:

remotes::install_github("mosaicdatabase/Rmosaic")
library(Rmosaic)
mosaic <- mos_fetch("v1.0.0")

Regressing Biomass on Generation Time

In this exercise, the simple bivariate relationship of biomass on generation time is explored through basic regression. Bare in mind, that the below offer the building blocks for more complex multiple regression exercises and other forms of analysis – such as geospatial interpolation and ordination-based practices (see Vignette #4 for a brief vignette of PVA using mosaic).

#1: Extract biomass data for mammals

Extract the biomass values for mammals in the MOSAIC dataset.

# Extracting all biomass data

mosaic_dataframe <- data.frame(mosaic@taxaMetadat, 
                               Biomass = as.numeric(mosaic@biomass@value),
                               MOSAIC_index = 1:length(mosaic@index))

# Restrict mammal data with non NA biomass data

mosaic_mammal <- subset(mosaic_dataframe,
                        Class == "Mammalia" & is.na(Biomass) == F)

# Extracting the first COMPADRE matrix id for each entry

matrix_ids_full <- mosaic@index[mosaic_mammal$MOSAIC_index]

# Extract just the first matrix id using sapply

mammal_matrix_ids <- sapply(matrix_ids_full, `[[`, 1)

# Add matrix id to the mammal data

mosaic_mammal$MatrixID <- mammal_matrix_ids

# Add full species names

mosaic_mammal$Binomial <- paste0(mosaic_mammal$Genus, " ", mosaic_mammal$Species)

#2: Download COMADRE matrix population models

A cdb_fetch() function is used to clone the Comadre database <www.compadre-db.org> - which is an s3 copy of a SQL-based relational database - into an S4 data object locally manipulable in R.

Using ids associated with each matrix population model, we will subset the Comadre database of matrix population models for the animals. First we use the Rcompadre package to ‘fetch’ the Comadre database.

# Download the most recent database

comadre <- cdb_fetch("comadre") 

# Rcompadre function flags potential issues with matrices, including NAs and ergodicity

comadre_flag <- cdb_flag(comadre)

# Remove matrices with NAs and those that are non-ergodic (which throw errors in generation time)

comadre_correct <- subset(comadre_flag,
                          check_NA_A == FALSE &
                          check_ergodic == TRUE)

The subsets of mosaic and Compadre can be overlapped in one line of code.

# Subset to mammal matrices

comadre_biomass <- subset(comadre_correct,
                          MatrixID %in% mammal_matrix_ids)

3. Calculating generation time

Rcompadre enables sub-setting of a matrix population model (life table) into constituent matrices representing the survival and fecundity/reproductive components (U and F matrices, respectively). Functions in the Rage package often require the specification of both the U and the F matrix, not the combined A matrix. The Rage package contains a suite of common demographic calculations - including GENERATION TIME - which are used to simplify this exercise. Note that other common demographic derivations, including transient indices can be extracted using the identical procedure.

# Calculate Generation Time

comadre_biomass$Generation_Time <- mapply(Rage::gen_time, 
                                          matU(comadre_biomass), 
                                          matF(comadre_biomass))

4. Explore the relationship of generation time data and biomass data

Matrix IDs can be used to bridge generation time from comaadre with mosaic records.

# Extract matrix ID and generation time columns (excluding other matrix metadata)

comadre_biomass <- as.data.frame(comadre_biomass)

comadre_biomass <- comadre_biomass[,c("MatrixID", "Generation_Time")]

# Join the datasets, using Matrix IDs to crosswalk the data

mosaic_biomass <- merge(mosaic_mammal, comadre_biomass,
                        by = "MatrixID", all.x = TRUE)
# Remove infinity values or NA

mosaic_biomass <- mosaic_biomass[!(is.infinite(mosaic_biomass$Generation_Time) |
                                     is.na(mosaic_biomass$Generation_Time)),]

# Log biomass and generation time

mosaic_biomass$log10_biomass <- log10(mosaic_biomass$Biomass)
mosaic_biomass$log10_gent <- log10(mosaic_biomass$Generation_Time)



# The plot

plot(mosaic_biomass$log10_biomass, mosaic_biomass$log10_gent,
     pch=16, cex=1.5, col=alpha("black", 0.5),
     xlab=expression(paste(log[10], "Biomass")),
     ylab=expression(paste(log[10], "Generation time"))
)


# Simple linear regression (note that we are skipping model fitting in this brief vignette)

lmBioGenT = lm(log10_gent~log10_biomass, data = mosaic_biomass) 

summary(lmBioGenT)
## 
## Call:
## lm(formula = log10_gent ~ log10_biomass, data = mosaic_biomass)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.53325 -0.16107 -0.01597  0.19936  0.38382 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    0.85966    0.25372   3.388  0.00253 **
## log10_biomass  0.02762    0.05728   0.482  0.63418   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2623 on 23 degrees of freedom
## Multiple R-squared:  0.01001,    Adjusted R-squared:  -0.03303 
## F-statistic: 0.2326 on 1 and 23 DF,  p-value: 0.6342
abline(lmBioGenT, col="red", lty=2, lwd=2)