In this session we will learn how to estimate different indices to
estimate species diversity. We will start by loading the data that we
saved at the end of session 3 (remember that you have to set the working
directory with the function set.wd() for R to be able to
find the files!)
## total number of taxa
load("community.rda",verbose=T) # loading from previous
## Loading objects:
## vas.plants
## tree.counts
## forest.types
## xy
## soil.data
## tables.join
We are using the vas.plants table to find species
richness: the number of taxa in a sample.
dim(vas.plants) # how many samples, how many taxa in the table
## [1] 30 79
rowSums(vas.plants) # total sum of rows (sites) colSums works for columns
## X001Vapramae X002Illi X003Vitipalu X004Konguta X005Vehendi
## 18.5 15.0 25.5 29.0 12.5
## X006.Erumae X007Porgumae X008Aardla X009Kurepalu X010Sarakuste
## 14.0 17.0 15.5 5.0 27.5
## X011Kannu X012Vonnu X013Rookse X014Kammeri X015Kambja
## 21.5 17.5 31.5 23.5 24.0
## X016Reola X017Ignase X018Unikula X019Haavametsa X020Kruusahaua
## 21.0 19.5 34.5 7.0 29.5
## X021Aravu X026Pedajamae X027Miti X028Puhaste X031Pimmelaan
## 10.5 26.5 9.0 11.0 38.0
## X032Logina X033Voorepalu X036Taevaskoja X044Savimae X073Pausakunnu
## 11.0 18.0 10.0 12.0 17.0
rowSums(vas.plants>0) # total count of rows (sums logical table if non-zero values)
## X001Vapramae X002Illi X003Vitipalu X004Konguta X005Vehendi
## 14 11 18 23 5
## X006.Erumae X007Porgumae X008Aardla X009Kurepalu X010Sarakuste
## 9 17 12 3 18
## X011Kannu X012Vonnu X013Rookse X014Kammeri X015Kambja
## 18 12 19 15 14
## X016Reola X017Ignase X018Unikula X019Haavametsa X020Kruusahaua
## 17 13 18 4 17
## X021Aravu X026Pedajamae X027Miti X028Puhaste X031Pimmelaan
## 7 22 6 8 26
## X032Logina X033Voorepalu X036Taevaskoja X044Savimae X073Pausakunnu
## 8 13 8 11 16
plot(sort(rowSums(vas.plants > 0), decreasing = T), ylab= "Species richness", xlab = "Rank") # making a graph where species richness is sorted from largest to smallest (x axis)
If we have species abundance (number of individuals, biomass, etc.), we can find species dominance.
# Rank-abundance graphs
plot(NA, xlim=c(1, max(rowSums(vas.plants > 0))), ylim = c(0.1, 1), log = "y",
ylab = "Relative abundance (log)", xlab = "Rank", main = "Rank-abundance graph") # empty graph
for (i in c(3, 5, 25, 28)) { # selecting 4 sites with id-s 3, 5, 25 and 28
spInSite <- vas.plants[i, vas.plants[i, ] > 0]
spInSiteRel <- as.numeric(spInSite / max(spInSite))
points(sort(spInSiteRel, decreasing = T), lwd = 2, pch = 16, type = "o", col = i)
}
# NB! color is defined by id number (each number has a color, here it is important that they just differ). We can identify to which site each line belongs adding a legend:
legend("topright", legend = c(3, 5, 25, 28), pch = 16, lwd = 2, col = c(3, 5, 25, 28))
In these graphs we can see four sites. Each have points as species in rank order (x axis), with the most abundant species in the left and the least abundant species in the right. The y-axis is relative abundance, compared to the most abundant taxon. Longer lines show more species rich sites. Sharp drops of the line show very high dominance. For example, site 5 is dominated by few species, whereas in site 3 there are mnany species with relatively high abundance; in this case, we say that the species evenness of site 3 is higher than that of site 5.
Often diversity indices are used which combine both richness and abundance differences (evenness). Just some examples below:
Shannon diveristy: \(H = - \sum_{i=1}^{S}{p_i ln(p_i)}\)
Inverse Simpson diversity: \(D_2 = \frac{1}{\sum_{i=1}^{S}{p_i^2}}\)
where \(p_i\) is the proportion (relative abundance) of species \(i\), and \(S\) is the number of species.
We will use the package vegan for diversity
calculations.
richness <- rowSums(vas.plants > 0) # definign richness for comparison as an object
library(vegan)
diversity.shannon <- diversity(vas.plants, "shannon")
eff.richness <- exp(diversity.shannon)
diversity.simpson <- diversity(vas.plants, "invsimpson")
pairs(data.frame(richness, eff.richness, diversity.shannon, diversity.simpson))
# Evenness -- a measure of abundance similarity, can be found by dividing Shannon diversity by ln(richness). Varies between 0 and 1.
evenness <- diversity.shannon / log(richness)
plot(richness, evenness)
If sampling has been unequal between sites (e.g. different number of individuals have been considered), this sampling difference can be taken account by rarefaction and extrapolations. Rarefaction looks how many species we expect to find when sampling randomly n individuals
rarefy(tree.counts, 3) ## NB! Works with counts!
## 001Vapramäe 002Illi 003Vitipalu 004Konguta 005Vehendi
## 1.900000 2.000000 1.850000 1.533333 1.000000
## 006Erumäe 007Põrgumäe 008Aardla 009Kurepalu 010Sarakuste
## 1.533333 1.857143 2.441667 1.900000 1.904762
## 011Kannu 012Võnnu 013Rookse 014Kammeri 015Kambja
## 2.000000 2.047619 2.114286 2.000000 2.400000
## 016Reola 017Ignase 018Uniküla 019Haavametsa 020Kruusahaua
## 2.450000 2.114286 2.237762 1.333333 1.642857
## 021Aravu 026Pedajamäe 027Miti 028Puhaste 031Pimmelaan
## 2.314286 1.272727 1.803571 1.642857 1.500000
## 032Logina 033Voorepalu 036Taevaskoja 044Savimäe 073Sulaoja
## 1.857143 1.583333 2.095238 1.375000 1.833333
## attr(,"Subsample")
## [1] 3
plot(rowSums(tree.counts > 0), rarefy(tree.counts, 3)) # richness of trees vs. rarefied richness estimate when randomly having 3 trees from each site.
## Sometimes it is nice to see species accumulation graphs from 2,3, ... n individuals
## First some max values
max.tree.count <- max(rowSums(tree.counts)) # max number of trees in a site
max.tree.rich <- max(rowSums(tree.counts > 0)) # max richness
# Now ploting
plot(NA, xlim = c(2, max.tree.count), ylim = c(0, max.tree.rich),
xlab="Number of trees",ylab="Species richness")
# Including all 30 sites!
for (i in 1:nrow(tree.counts)) {
trees <- 2:rowSums(tree.counts[i, ])
rar <- rarefy(tree.counts[i, ], trees)
# Let's select colors randomly
lines(x = trees, y = rar[1, ], pch = 16,
col = rgb(red = sample(100, 1),
green = sample(100, 1),
blue = sample(100, 1),
maxColorValue = 100))
}
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 3
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 2
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 5
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 2
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 3
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 2
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 2
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 3
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 2
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 3
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 2
## Warning in rarefy(tree.counts[i, ], trees): most observed count data have
## counts 1, but smallest count is 5
We might want to know what is the total expected richness (across all
samples) when 1, 2, …, n samples are included. There is a function
specaccum which finds the mean and also error bars.
plot(specaccum(tree.counts)) ## specaccum only considers presences/absences!!
## Let's evaluate total richness separately for sites with low and sites with high soil pH:
high.pH <- soil.data$pH.KCl > median(soil.data$pH.KCl)
plot(specaccum(tree.counts[high.pH, ]), lwd = 2, col = "blue")
plot(specaccum(tree.counts[!high.pH, ]), col = "red", add=T)
legend("bottomright", legend = c("High pH", "Low pH"), col = c("blue", "red"), lwd = 2)
Beta diversity explores how much the different samples overlap in their species composition. Beta diversity can be defined through the difference between the mean richness across samples and the total richness over all samples in two ways: multiplicatively or additiviely.
# multiplicative beta diversity
ncol(vas.plants) / mean(rowSums(vas.plants > 0))
## [1] 5.895522
# additive beta diversity
ncol(vas.plants) - mean(rowSums(vas.plants > 0))
## [1] 65.6
# Comparing beta diversities between High vs. low pH soil subsamples
high.ph.gamma <- sum(colSums(vas.plants[high.pH,])>0)
low.ph.gamma <- sum(colSums(vas.plants[!high.pH,])>0)
high.ph.gamma
## [1] 74
low.ph.gamma
## [1] 47
high.ph.gamma / mean(rowSums(vas.plants[high.pH, ] > 0))
## [1] 4.302326
low.ph.gamma / mean(rowSums(vas.plants[!high.pH, ] > 0))
## [1] 4.895833
high.ph.gamma - mean(rowSums(vas.plants[high.pH, ] > 0))
## [1] 56.8
low.ph.gamma - mean(rowSums(vas.plants[!high.pH, ] > 0))
## [1] 37.4
Dark diversity is the set of absent species which can theoretically occur in a study site (are present in the region and can tolerate local conditions).
A method based on species co-occurrences. Package ´DarkDiv`
Background in:
Lewis, R.J., Szava-Kovats, R. & Pärtel, M. 2016. Estimating dark diversity and species pools: An empirical assessment of two methods. Methods in Ecology and Evolution 7: 104-113.
CP Carmona, M Pärtel (2020) Estimating probabilistic site-specific species pools and dark diversity from co-occurrence data. Global Ecology and Biogeography (in press) https://doi.org/10.1111/geb.13203
library(DarkDiv)
# Using default method based on hypergeometric distribution
dark <- DarkDiv(vas.plants)
str(dark) # output is a list!
## List of 4
## $ indication: num [1:79, 1:79] 0 2.82 1.12 2.82 -0.48 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:79] "T.ACTAspic" "T.AEGOpoda" "T.ANEMnemo" "T.ASAReuro" ...
## .. ..$ : chr [1:79] "I.ACTAspic" "I.AEGOpoda" "I.ANEMnemo" "I.ASAReuro" ...
## $ AllProbs : num [1:30, 1:79] 0.7 0.324 0.663 0.421 0.308 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:30] "X001Vapramae" "X002Illi" "X003Vitipalu" "X004Konguta" ...
## .. ..$ : chr [1:79] "ACTAspic" "AEGOpoda" "ANEMnemo" "ASAReuro" ...
## $ Pool : num [1:30, 1:79] 0.7 0.324 0.663 0.421 0.308 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:30] "X001Vapramae" "X002Illi" "X003Vitipalu" "X004Konguta" ...
## .. ..$ : chr [1:79] "ACTAspic" "AEGOpoda" "ANEMnemo" "ASAReuro" ...
## $ Dark : num [1:30, 1:79] 0.7 0.324 0.663 0.421 0.308 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:30] "X001Vapramae" "X002Illi" "X003Vitipalu" "X004Konguta" ...
## .. ..$ : chr [1:79] "ACTAspic" "AEGOpoda" "ANEMnemo" "ASAReuro" ...
# There are several tables in the output, but we only need the table "Dark"
dark <- dark$Dark
dark[1:5,1:5] # We have NA values since observed species get value NA
## ACTAspic AEGOpoda ANEMnemo ASAReuro ATHYfil.fe
## X001Vapramae 0.6999908 0.5663234 0.80163864 NA 0.5758858
## X002Illi 0.3241750 0.2668499 0.19313708 0.2095176 0.2404332
## X003Vitipalu 0.6629486 NA NA 0.6197400 0.5111869
## X004Konguta 0.4212651 0.3635984 0.59325066 0.4547887 0.5246721
## X005Vehendi 0.3084245 0.1446135 0.02658442 0.2058245 0.1505195
richness <- rowSums(vas.plants > 0)
dark.div <- rowSums(dark > 0.7, na.rm=T)
# For simplicity we make binary dark diversity based on threshold of 0.7
# na.rm=T needed to skip NA values (otherwise summing will result NA as well)
species.pool <- richness + dark.div # species pool is the sum of observed richenss and dark diversity
plot(species.pool, richness)
abline(0, 1)
plot(richness, dark.div)
completeness <- richness / species.pool * 100 # completeness reflects how large proportion of species pool has been realized locally
plot(species.pool,completeness)
We will exploring the relationship between diversity and a soil parameter – soil pH.
cor(soil.data$pH.KCl, richness) # just a correlation coefficient
## [1] 0.7247559
# Pearson correlation test requires normality of both variables and linearity
hist(soil.data$pH.KCl)
hist(richness)
# Normality test: Shapiro-Wilks test
shapiro.test(soil.data$pH.KCl)
##
## Shapiro-Wilk normality test
##
## data: soil.data$pH.KCl
## W = 0.94732, p-value = 0.1432
shapiro.test(richness)
##
## Shapiro-Wilk normality test
##
## data: richness
## W = 0.97798, p-value = 0.7696
# Linear fit
plot(soil.data$pH.KCl,richness)
# OK. If not ok, then try to transform data (e.g. log). If this does not help, use rank correlation (e.g Spearman)
cor.o <- cor.test(soil.data$pH.KCl, richness) # recording output from a test as an object
cor.o
##
## Pearson's product-moment correlation
##
## data: soil.data$pH.KCl and richness
## t = 5.5661, df = 28, p-value = 5.919e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4932877 0.8603750
## sample estimates:
## cor
## 0.7247559
str(cor.o) ## data type for test output is a list as well (i.e. a mixture of different types of objects).
## List of 9
## $ statistic : Named num 5.57
## ..- attr(*, "names")= chr "t"
## $ parameter : Named int 28
## ..- attr(*, "names")= chr "df"
## $ p.value : num 5.92e-06
## $ estimate : Named num 0.725
## ..- attr(*, "names")= chr "cor"
## $ null.value : Named num 0
## ..- attr(*, "names")= chr "correlation"
## $ alternative: chr "two.sided"
## $ method : chr "Pearson's product-moment correlation"
## $ data.name : chr "soil.data$pH.KCl and richness"
## $ conf.int : num [1:2] 0.493 0.86
## ..- attr(*, "conf.level")= num 0.95
## - attr(*, "class")= chr "htest"
cor.o$p.value # retrieving a component of the list (p-value)
## [1] 5.91914e-06
# Regression model -- we have hypothesis of dependent (richness) and independent (pH) variables
model <- lm(richness ~ soil.data$pH.KCl)
summary(model)
##
## Call:
## lm(formula = richness ~ soil.data$pH.KCl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.8523 -2.8222 -0.5426 2.9875 7.4495
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.152 3.766 -1.899 0.0679 .
## soil.data$pH.KCl 5.720 1.028 5.566 5.92e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.066 on 28 degrees of freedom
## Multiple R-squared: 0.5253, Adjusted R-squared: 0.5083
## F-statistic: 30.98 on 1 and 28 DF, p-value: 5.919e-06
abline(model, col="darkred", lty = 2, lwd = 2) # adding regression line to the graph.
# Regression expects that model residuals are normally distributed (i.e. testing if there is a deviation from normal distribution)
shapiro.test(resid(model))
##
## Shapiro-Wilk normality test
##
## data: resid(model)
## W = 0.96457, p-value = 0.403
# OK -- no deviation!