4 Species diversity

In this session we will learn how to estimate different indices to estimate species diversity. We will start by loading the data that we saved at the end of session 3 (remember that you have to set the working directory with the function set.wd() for R to be able to find the files!)

## total number of taxa

load("community.rda",verbose=T) # loading from previous 
## Loading objects:
##   vas.plants
##   tree.counts
##   forest.types
##   xy
##   soil.data
##   tables.join

4.1 Species richness

We are using the vas.plants table to find species richness: the number of taxa in a sample.

dim(vas.plants) # how many samples, how many taxa in the table
## [1] 30 79

rowSums(vas.plants) # total sum of rows (sites) colSums works for columns
##   X001Vapramae       X002Illi   X003Vitipalu    X004Konguta    X005Vehendi 
##           18.5           15.0           25.5           29.0           12.5 
##    X006.Erumae   X007Porgumae     X008Aardla   X009Kurepalu  X010Sarakuste 
##           14.0           17.0           15.5            5.0           27.5 
##      X011Kannu      X012Vonnu     X013Rookse    X014Kammeri     X015Kambja 
##           21.5           17.5           31.5           23.5           24.0 
##      X016Reola     X017Ignase    X018Unikula X019Haavametsa X020Kruusahaua 
##           21.0           19.5           34.5            7.0           29.5 
##      X021Aravu  X026Pedajamae       X027Miti    X028Puhaste  X031Pimmelaan 
##           10.5           26.5            9.0           11.0           38.0 
##     X032Logina  X033Voorepalu X036Taevaskoja    X044Savimae X073Pausakunnu 
##           11.0           18.0           10.0           12.0           17.0
rowSums(vas.plants>0) # total count of rows (sums logical table if non-zero values)
##   X001Vapramae       X002Illi   X003Vitipalu    X004Konguta    X005Vehendi 
##             14             11             18             23              5 
##    X006.Erumae   X007Porgumae     X008Aardla   X009Kurepalu  X010Sarakuste 
##              9             17             12              3             18 
##      X011Kannu      X012Vonnu     X013Rookse    X014Kammeri     X015Kambja 
##             18             12             19             15             14 
##      X016Reola     X017Ignase    X018Unikula X019Haavametsa X020Kruusahaua 
##             17             13             18              4             17 
##      X021Aravu  X026Pedajamae       X027Miti    X028Puhaste  X031Pimmelaan 
##              7             22              6              8             26 
##     X032Logina  X033Voorepalu X036Taevaskoja    X044Savimae X073Pausakunnu 
##              8             13              8             11             16

plot(sort(rowSums(vas.plants > 0), decreasing = T), ylab= "Species richness", xlab = "Rank") # making a graph where species richness is sorted from largest to smallest (x axis)

4.2 Species richness and dominance

If we have species abundance (number of individuals, biomass, etc.), we can find species dominance.

# Rank-abundance graphs
plot(NA, xlim=c(1, max(rowSums(vas.plants > 0))), ylim = c(0.1, 1), log = "y",
     ylab = "Relative abundance (log)", xlab = "Rank", main = "Rank-abundance graph") # empty graph
for (i in c(3, 5, 25, 28)) { # selecting 4  sites with id-s 3, 5, 25 and 28
  spInSite <- vas.plants[i, vas.plants[i, ] > 0]
  spInSiteRel <- as.numeric(spInSite / max(spInSite))
  points(sort(spInSiteRel, decreasing = T), lwd = 2, pch = 16, type = "o", col = i)
}

# NB! color is defined by id number (each number has a color, here it is important that they just differ).  We can identify to which site each line belongs adding a  legend:
legend("topright", legend = c(3, 5, 25, 28), pch  = 16, lwd = 2, col = c(3, 5, 25, 28))

In these graphs we can see four sites. Each have points as species in rank order (x axis), with the most abundant species in the left and the least abundant species in the right. The y-axis is relative abundance, compared to the most abundant taxon. Longer lines show more species rich sites. Sharp drops of the line show very high dominance. For example, site 5 is dominated by few species, whereas in site 3 there are mnany species with relatively high abundance; in this case, we say that the species evenness of site 3 is higher than that of site 5.

4.3 Diversity indices

Often diversity indices are used which combine both richness and abundance differences (evenness). Just some examples below:

Shannon diveristy: \(H = - \sum_{i=1}^{S}{p_i ln(p_i)}\)

Inverse Simpson diversity: \(D_2 = \frac{1}{\sum_{i=1}^{S}{p_i^2}}\)

where \(p_i\) is the proportion (relative abundance) of species \(i\), and \(S\) is the number of species.

We will use the package vegan for diversity calculations.


richness <- rowSums(vas.plants > 0)  # definign richness for comparison as an object

library(vegan)

diversity.shannon <- diversity(vas.plants, "shannon")
eff.richness <- exp(diversity.shannon)
diversity.simpson  <- diversity(vas.plants, "invsimpson")

pairs(data.frame(richness, eff.richness, diversity.shannon, diversity.simpson))


# Evenness -- a measure of abundance similarity, can be found by dividing Shannon diversity by ln(richness). Varies between 0 and 1.

evenness <- diversity.shannon / log(richness)

plot(richness, evenness)

4.4 Unequal sampling? Rarefaction and extrapolations

If sampling has been unequal between sites (e.g. different number of individuals have been considered), this sampling difference can be taken account by rarefaction and extrapolations. Rarefaction looks how many species we expect to find when sampling randomly n individuals

rarefy(tree.counts, 3)  ## NB! Works with counts!
##   001Vapramäe       002Illi   003Vitipalu    004Konguta    005Vehendi 
##      1.900000      2.000000      1.850000      1.533333      1.000000 
##     006Erumäe   007Põrgumäe     008Aardla   009Kurepalu  010Sarakuste 
##      1.533333      1.857143      2.441667      1.900000      1.904762 
##      011Kannu      012Võnnu     013Rookse    014Kammeri     015Kambja 
##      2.000000      2.047619      2.114286      2.000000      2.400000 
##      016Reola     017Ignase    018Uniküla 019Haavametsa 020Kruusahaua 
##      2.450000      2.114286      2.237762      1.333333      1.642857 
##      021Aravu  026Pedajamäe       027Miti    028Puhaste  031Pimmelaan 
##      2.314286      1.272727      1.803571      1.642857      1.500000 
##     032Logina  033Voorepalu 036Taevaskoja    044Savimäe    073Sulaoja 
##      1.857143      1.583333      2.095238      1.375000      1.833333 
## attr(,"Subsample")
## [1] 3

plot(rowSums(tree.counts > 0), rarefy(tree.counts, 3)) # richness of trees vs. rarefied richness estimate when randomly having 3 trees from each site.


## Sometimes it is nice to see species accumulation graphs from 2,3, ... n individuals
## First some max values
max.tree.count <- max(rowSums(tree.counts)) # max number of trees in a site

max.tree.rich <- max(rowSums(tree.counts > 0)) # max richness

# Now ploting

plot(NA, xlim = c(2, max.tree.count), ylim = c(0, max.tree.rich),
     xlab="Number of trees",ylab="Species richness")

# Including all 30 sites!
for (i in 1:nrow(tree.counts)) { 
  trees <- 2:rowSums(tree.counts[i, ])
  rar <- rarefy(tree.counts[i, ], trees)
  # Let's select colors randomly
  lines(x = trees, y = rar[1, ], pch = 16,
        col = rgb(red  = sample(100, 1),
                  green = sample(100, 1),
                  blue  = sample(100, 1),
                  maxColorValue = 100))
}
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 3
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 2
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 5
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 2
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 3
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 2

## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 2
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 3
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 2
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 3
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 2
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 5

4.5 Total estimated richness

We might want to know what is the total expected richness (across all samples) when 1, 2, …, n samples are included. There is a function specaccum which finds the mean and also error bars.

plot(specaccum(tree.counts))  ## specaccum only considers presences/absences!!


## Let's evaluate total richness separately for sites with low and sites with high soil pH:
high.pH <- soil.data$pH.KCl > median(soil.data$pH.KCl)
plot(specaccum(tree.counts[high.pH, ]), lwd = 2, col = "blue")
plot(specaccum(tree.counts[!high.pH, ]), col = "red", add=T)
legend("bottomright", legend = c("High pH", "Low pH"), col = c("blue", "red"), lwd = 2)

4.6 Beta diversity

Beta diversity explores how much the different samples overlap in their species composition. Beta diversity can be defined through the difference between the mean richness across samples and the total richness over all samples in two ways: multiplicatively or additiviely.


# multiplicative beta diversity

ncol(vas.plants) / mean(rowSums(vas.plants > 0))
## [1] 5.895522

# additive beta diversity

ncol(vas.plants) - mean(rowSums(vas.plants > 0))
## [1] 65.6


# Comparing beta diversities between High vs. low pH soil subsamples

high.ph.gamma <- sum(colSums(vas.plants[high.pH,])>0)
low.ph.gamma <- sum(colSums(vas.plants[!high.pH,])>0)

high.ph.gamma
## [1] 74
low.ph.gamma
## [1] 47

high.ph.gamma / mean(rowSums(vas.plants[high.pH, ] > 0))
## [1] 4.302326
low.ph.gamma / mean(rowSums(vas.plants[!high.pH, ] > 0))
## [1] 4.895833


high.ph.gamma - mean(rowSums(vas.plants[high.pH, ] > 0))
## [1] 56.8
low.ph.gamma - mean(rowSums(vas.plants[!high.pH, ] > 0))
## [1] 37.4

4.7 Dark diversity

Dark diversity is the set of absent species which can theoretically occur in a study site (are present in the region and can tolerate local conditions).

A method based on species co-occurrences. Package ´DarkDiv`

Background in:

Lewis, R.J., Szava-Kovats, R. & Pärtel, M. 2016. Estimating dark diversity and species pools: An empirical assessment of two methods. Methods in Ecology and Evolution 7: 104-113.

CP Carmona, M Pärtel (2020) Estimating probabilistic site-specific species pools and dark diversity from co-occurrence data. Global Ecology and Biogeography (in press) https://doi.org/10.1111/geb.13203

library(DarkDiv)
# Using default method based on hypergeometric distribution

dark <- DarkDiv(vas.plants)

str(dark) # output is a list! 
## List of 4
##  $ indication: num [1:79, 1:79] 0 2.82 1.12 2.82 -0.48 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:79] "T.ACTAspic" "T.AEGOpoda" "T.ANEMnemo" "T.ASAReuro" ...
##   .. ..$ : chr [1:79] "I.ACTAspic" "I.AEGOpoda" "I.ANEMnemo" "I.ASAReuro" ...
##  $ AllProbs  : num [1:30, 1:79] 0.7 0.324 0.663 0.421 0.308 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:30] "X001Vapramae" "X002Illi" "X003Vitipalu" "X004Konguta" ...
##   .. ..$ : chr [1:79] "ACTAspic" "AEGOpoda" "ANEMnemo" "ASAReuro" ...
##  $ Pool      : num [1:30, 1:79] 0.7 0.324 0.663 0.421 0.308 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:30] "X001Vapramae" "X002Illi" "X003Vitipalu" "X004Konguta" ...
##   .. ..$ : chr [1:79] "ACTAspic" "AEGOpoda" "ANEMnemo" "ASAReuro" ...
##  $ Dark      : num [1:30, 1:79] 0.7 0.324 0.663 0.421 0.308 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:30] "X001Vapramae" "X002Illi" "X003Vitipalu" "X004Konguta" ...
##   .. ..$ : chr [1:79] "ACTAspic" "AEGOpoda" "ANEMnemo" "ASAReuro" ...

# There are several tables in the output, but we only need the table "Dark"

dark <- dark$Dark

dark[1:5,1:5] # We have NA values since observed species get value NA
##               ACTAspic  AEGOpoda   ANEMnemo  ASAReuro ATHYfil.fe
## X001Vapramae 0.6999908 0.5663234 0.80163864        NA  0.5758858
## X002Illi     0.3241750 0.2668499 0.19313708 0.2095176  0.2404332
## X003Vitipalu 0.6629486        NA         NA 0.6197400  0.5111869
## X004Konguta  0.4212651 0.3635984 0.59325066 0.4547887  0.5246721
## X005Vehendi  0.3084245 0.1446135 0.02658442 0.2058245  0.1505195

richness <- rowSums(vas.plants > 0)
dark.div <- rowSums(dark > 0.7, na.rm=T) 
# For simplicity we make binary dark diversity based on threshold of 0.7
# na.rm=T needed to skip NA values (otherwise summing will result NA as well)

species.pool <- richness + dark.div  # species pool is the sum of observed richenss and dark diversity

plot(species.pool, richness)
abline(0, 1)


plot(richness, dark.div)


completeness <- richness / species.pool * 100 # completeness reflects how large proportion of species pool has been realized locally

plot(species.pool,completeness)

4.8 Species diversity and environment

We will exploring the relationship between diversity and a soil parameter – soil pH.


cor(soil.data$pH.KCl, richness) # just a correlation coefficient
## [1] 0.7247559

# Pearson correlation test requires normality of both variables and linearity
hist(soil.data$pH.KCl)

hist(richness)


# Normality test: Shapiro-Wilks test
shapiro.test(soil.data$pH.KCl)
## 
##  Shapiro-Wilk normality test
## 
## data:  soil.data$pH.KCl
## W = 0.94732, p-value = 0.1432
shapiro.test(richness)
## 
##  Shapiro-Wilk normality test
## 
## data:  richness
## W = 0.97798, p-value = 0.7696

# Linear fit
plot(soil.data$pH.KCl,richness)


# OK. If not ok, then try to transform data (e.g. log). If this does not help, use rank correlation (e.g  Spearman)

cor.o <- cor.test(soil.data$pH.KCl, richness) # recording output from a test as an object
cor.o
## 
##  Pearson's product-moment correlation
## 
## data:  soil.data$pH.KCl and richness
## t = 5.5661, df = 28, p-value = 5.919e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4932877 0.8603750
## sample estimates:
##       cor 
## 0.7247559

str(cor.o)  ## data type for test output is a list as well (i.e. a mixture of different types of objects).
## List of 9
##  $ statistic  : Named num 5.57
##   ..- attr(*, "names")= chr "t"
##  $ parameter  : Named int 28
##   ..- attr(*, "names")= chr "df"
##  $ p.value    : num 5.92e-06
##  $ estimate   : Named num 0.725
##   ..- attr(*, "names")= chr "cor"
##  $ null.value : Named num 0
##   ..- attr(*, "names")= chr "correlation"
##  $ alternative: chr "two.sided"
##  $ method     : chr "Pearson's product-moment correlation"
##  $ data.name  : chr "soil.data$pH.KCl and richness"
##  $ conf.int   : num [1:2] 0.493 0.86
##   ..- attr(*, "conf.level")= num 0.95
##  - attr(*, "class")= chr "htest"

cor.o$p.value # retrieving a component of the list (p-value)
## [1] 5.91914e-06

# Regression model -- we have hypothesis of dependent (richness) and independent (pH) variables
model <- lm(richness ~ soil.data$pH.KCl)

summary(model)
## 
## Call:
## lm(formula = richness ~ soil.data$pH.KCl)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8523 -2.8222 -0.5426  2.9875  7.4495 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -7.152      3.766  -1.899   0.0679 .  
## soil.data$pH.KCl    5.720      1.028   5.566 5.92e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.066 on 28 degrees of freedom
## Multiple R-squared:  0.5253, Adjusted R-squared:  0.5083 
## F-statistic: 30.98 on 1 and 28 DF,  p-value: 5.919e-06

abline(model, col="darkred", lty = 2, lwd = 2) # adding regression line to the graph.


# Regression expects that model residuals are normally distributed (i.e. testing if there is a deviation from normal distribution)
shapiro.test(resid(model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(model)
## W = 0.96457, p-value = 0.403

# OK -- no deviation!