Ordinations aim to put samples and taxa in order so that more similar items are close to each other. They helps to visualize the similarity in structure between ecological communities.
If we have just two taxa, we can plot samples on a 2-dimensional space where the axes reflect the abundance of each of these species. Then the distance between samples is the distance between points on the plot. We can imagine a 3-dimensional space (x, y and z). However, generally we have many more taxa. Let’s select the 5 taxa with the highest IndVal value and make pairwise graphs!
load("community.rda") # loading from previous
load("clusters.rda")
comm.data <- log1p(vas.plants)
freq.spp <- c(58, 18, 10, 27, 38) # 5 most important spp from clusters
pairs(vas.plants[, freq.spp], pch = 16, col = rgb(0, 0, 1, 0.3))
Even with 5 taxa we have 10 pairwise graphs! Mathematically we can define as many axes as needed and put all points on this multi-dimensional space. Ordination allows to reduce the number of axes while keeping as much as possible of the original variation between sites.
Ordination of quantitative data, based on Euclidean distance between samples. There are different functions to perform PCA in different packages.
PCA first examines the cloud of points in the multidimensional matrix and puts the fist axis along the largest variation; the following axis is the one that is perpendicular to the first one and describes the largest proportion of the variation left. This process is done (finding perpendicular axes to those previously selected), until no information is left to be explained.
This method assumes that abundance of taxa are linearly related to some gradients (just increasing or decreasing). It might be true for short environmental gradient but for longer gradients unimodal responses are more likely.
NB! Ordination axes does not have meaningful direction, they only show variation. You can always multiply some axis by -1 to reverse its direction!
library(vegan)
# Let's make an artificial table with 2 taxa a and b
test <- data.frame(a = c(1, 2, 3, 2, 1), b = c(3, 2, 2, 4, 4))
plot(test, type = "o")
# How does the same data looks in PCA axes?
o.pca <- prcomp(test)
plot(scores(o.pca), asp = 1, type = "o")
# since we had originally 2 dimensions then we have the same shape but PC1 is describing the most variable direction.
summary(o.pca)
## Importance of components:
## PC1 PC2
## Standard deviation 1.1713 0.5727
## Proportion of Variance 0.8071 0.1929
## Cumulative Proportion 0.8071 1.0000
# how much of total variation is described by the principal components.
# Now, using real community data
o.pca <- prcomp(comm.data)
summary(o.pca) # How much our principal components describe
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.3064 0.8512 0.78620 0.72071 0.62673 0.57314 0.54010
## Proportion of Variance 0.2555 0.1085 0.09254 0.07777 0.05881 0.04918 0.04367
## Cumulative Proportion 0.2555 0.3640 0.45655 0.53431 0.59312 0.64230 0.68597
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.52109 0.49162 0.45124 0.41312 0.39359 0.37787 0.35786
## Proportion of Variance 0.04065 0.03618 0.03048 0.02555 0.02319 0.02138 0.01917
## Cumulative Proportion 0.72662 0.76281 0.79329 0.81885 0.84204 0.86341 0.88259
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.35043 0.31108 0.30573 0.27529 0.27072 0.23740 0.21337
## Proportion of Variance 0.01838 0.01449 0.01399 0.01135 0.01097 0.00844 0.00682
## Cumulative Proportion 0.90097 0.91546 0.92945 0.94080 0.95177 0.96021 0.96703
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 0.21089 0.1917 0.18782 0.16537 0.15206 0.1484 0.14236
## Proportion of Variance 0.00666 0.0055 0.00528 0.00409 0.00346 0.0033 0.00303
## Cumulative Proportion 0.97368 0.9792 0.98447 0.98856 0.99202 0.9953 0.99836
## PC29 PC30
## Standard deviation 0.10480 5.802e-16
## Proportion of Variance 0.00164 0.000e+00
## Cumulative Proportion 1.00000 1.000e+00
pairs(scores(o.pca)[, 1:4], pch = 16, cex = 0.5) # ordination plots
# however, generally only PC1 and PC2 are used
par(mfrow = c(1, 2))
plot(scores(o.pca)[, 1:2], asp = 1, pch = o.grel) # clusters
legend("topleft", legend = unique(o.grel), pch = unique(o.grel))
# Eigenvectors of frequent species -- describing their contribution to principal components, can be visualized by arrows (vectors)
x <- o.pca$rotation[freq.spp, 1]
y <- o.pca$rotation[freq.spp, 2]
plot(x, y, type = "n", asp = 1)
abline(h = 0, v = 0, col = "gray")
arrows(0, 0, x, y, length = 0.03, col = "red")
text(x, y, colnames(comm.data)[freq.spp], cex = 0.6)
par(mfrow=c(1,1))
Ordination based on eigenvalues and any distance matrix. If you select Euclidean distance, it is equal to PCA.
library(ape)
vegdist <- vegdist(comm.data, "euclidean")
o.pco <- pcoa(vegdist)
par(mfrow = c(1, 2))
plot(o.pco$vectors[, 1:2], asp = 1, pch = o.grel, xlab = "PCO1", ylab = "PCO2")
plot(scores(o.pca)[, 1:2], asp = 1, pch = o.grel)
## Should be identical (NB! axis direction does not have meaning)
# Now using Bray-Curtis distance
plot(o.pco$vectors[, 1:2], asp = 1, pch = o.grel, xlab = "PCO1", ylab = "PCO2")
vegdist <- vegdist(comm.data, "bray")
o.pco <- pcoa(vegdist)
plot(o.pco$vectors[, 1:2], asp = 1, pch = o.grel)
par(mfrow = c(1, 1))
This technique (also called Reciprocal Averaging) tries to ordinate both samples and taxa in parallel. It expects unimodal response curves of taxa and aims to find the weighted averages for taxa and sites.
o.ca <- cca(comm.data)
summary(o.ca)$cont$importance[, 1:3]
## CA1 CA2 CA3
## Eigenvalue 0.5392631 0.39437464 0.33141756
## Proportion Explained 0.1258140 0.09201047 0.07732213
## Cumulative Proportion 0.1258140 0.21782447 0.29514660
plot(o.ca$CA$u[, 1:2], pch = o.grel, asp = 1)
plot(o.ca) # Biplot where both sites and taxa are given (their averaged locations)
# Adding manually sites and species to obtain a more clean image
plot(o.ca, type = "n")
points(o.ca, display = "sites", cex = 0.8, pch = o.grel, col = "red")
text(o.ca, display = "spec", cex = 0.7, col = "blue",
select = freq.spp)
plot(o.ca, type = "n", xlim = c(-2, 3.5))
points(o.ca, display = "sites", cex = 0.8, pch = 16, col = "red")
ordipointlabel(o.ca, cex = 0.7, display = "species", col = "blue", add = T,
select = freq.spp) ## Tries to optimize the location of the text labels to avoid overlap
tabasco(comm.data, o.ca) # ordering tables using CA for both samples and taxa.
Not based on maximum variation but on shifting iteratively of objects within a low number of axes so that the distance between samples is maximally kept. The algorithm starts from a random order or PCA. Compares the difference between real distance and the distance within the ordination space (this difference is called “stress”). Often used nowadays because computing is not limiting any more. A rule of thumb: stress ca 0.05 means an excellent representation in reduced dimensions, 0.1 is great, 0.2 is satisfactory, and stress >0.3 means a poor representation.
We also explore how to limit clusters and draw species richness on the ordination graph.
o.mds <- metaMDS(comm.data, distance = "euclidean", k = 3)
## Run 0 stress 0.1125173
## Run 1 stress 0.1122945
## ... New best solution
## ... Procrustes: rmse 0.02116059 max resid 0.08415238
## Run 2 stress 0.1122949
## ... Procrustes: rmse 0.0006232159 max resid 0.002437037
## ... Similar to previous best
## Run 3 stress 0.1190965
## Run 4 stress 0.1125174
## ... Procrustes: rmse 0.02149199 max resid 0.08495809
## Run 5 stress 0.1122949
## ... Procrustes: rmse 0.0005283557 max resid 0.001997025
## ... Similar to previous best
## Run 6 stress 0.1125172
## ... Procrustes: rmse 0.02140855 max resid 0.08473567
## Run 7 stress 0.1122947
## ... Procrustes: rmse 0.0004807706 max resid 0.001879513
## ... Similar to previous best
## Run 8 stress 0.1194154
## Run 9 stress 0.1125184
## ... Procrustes: rmse 0.02184386 max resid 0.08587465
## Run 10 stress 0.1125177
## ... Procrustes: rmse 0.0216377 max resid 0.08533981
## Run 11 stress 0.1122945
## ... New best solution
## ... Procrustes: rmse 0.0001553032 max resid 0.0006005276
## ... Similar to previous best
## Run 12 stress 0.1184554
## Run 13 stress 0.1122945
## ... Procrustes: rmse 5.370686e-05 max resid 0.0001680833
## ... Similar to previous best
## Run 14 stress 0.112517
## ... Procrustes: rmse 0.02118157 max resid 0.08425624
## Run 15 stress 0.1122944
## ... New best solution
## ... Procrustes: rmse 0.0001041504 max resid 0.0003010347
## ... Similar to previous best
## Run 16 stress 0.1125406
## ... Procrustes: rmse 0.01342035 max resid 0.05806213
## Run 17 stress 0.1125171
## ... Procrustes: rmse 0.02122666 max resid 0.08435193
## Run 18 stress 0.1194156
## Run 19 stress 0.1190995
## Run 20 stress 0.1194153
## *** Best solution repeated 1 times
stressplot(o.mds, vegdist) # stress is the distance of points from the line! Looks fine!
plot(o.mds) # biplot, samples taxa
plot(o.mds$points, pch = o.grel) # only samples
ordihull(o.mds, o.grel, col = 1:5) # connecting clusters
## Adding species diversity to plot
plot(o.mds$points, pch = o.grel, col = o.grel)
ordisurf(o.mds, diversity(comm.data), col = "grey", main = "Shannon diversity",
add = T)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(x1, x2, k = 10, bs = "tp", fx = FALSE)
##
## Estimated degrees of freedom:
## 5.1 total = 6.1
##
## REML score: 15.15086
ordihull(o.mds, o.grel, col = 1:5)
legend("bottomleft", legend = unique(o.grel), col = unique(o.grel),
pch = unique(o.grel), lwd = 1)
> Try also ordispider and ordiellipse functions
plot(o.mds$points, pch = o.grel, col = o.grel)
ordispider(o.mds, o.grel, col = unique(o.grel))
ordiellipse(o.mds, o.grel, col = unique(o.grel),
kind="se", lwd=2, con = 0.95)
## Warning in chol.default(cov, pivot = TRUE): the matrix is either rank-deficient
## or not positive definite
## Warning in chol.default(cov, pivot = TRUE): the matrix is either rank-deficient
## or not positive definite
library(vegan3d)
ordiplot3d(o.mds, type = "h", pch = o.grel,
, col = o.grel)
ordirgl(o.mds, col = o.grel, pch = o.grel) # Should open a new window!
save(o.mds, file = "ordi.rda")