6 Ordinations

Ordinations aim to put samples and taxa in order so that more similar items are close to each other. They helps to visualize the similarity in structure between ecological communities.

6.1 Taxa as axes

If we have just two taxa, we can plot samples on a 2-dimensional space where the axes reflect the abundance of each of these species. Then the distance between samples is the distance between points on the plot. We can imagine a 3-dimensional space (x, y and z). However, generally we have many more taxa. Let’s select the 5 taxa with the highest IndVal value and make pairwise graphs!

load("community.rda") # loading from previous
load("clusters.rda")
comm.data <- log1p(vas.plants)
freq.spp <- c(58, 18, 10, 27, 38) # 5 most important spp from clusters
pairs(vas.plants[, freq.spp], pch = 16, col = rgb(0, 0, 1, 0.3))

Even with 5 taxa we have 10 pairwise graphs! Mathematically we can define as many axes as needed and put all points on this multi-dimensional space. Ordination allows to reduce the number of axes while keeping as much as possible of the original variation between sites.

6.2 Principal Component Analysis (PCA)

Ordination of quantitative data, based on Euclidean distance between samples. There are different functions to perform PCA in different packages.

PCA first examines the cloud of points in the multidimensional matrix and puts the fist axis along the largest variation; the following axis is the one that is perpendicular to the first one and describes the largest proportion of the variation left. This process is done (finding perpendicular axes to those previously selected), until no information is left to be explained.

This method assumes that abundance of taxa are linearly related to some gradients (just increasing or decreasing). It might be true for short environmental gradient but for longer gradients unimodal responses are more likely.

NB! Ordination axes does not have meaningful direction, they only show variation. You can always multiply some axis by -1 to reverse its direction!

library(vegan)

# Let's make an artificial table with 2 taxa a and b
test <- data.frame(a = c(1, 2, 3, 2, 1), b = c(3, 2, 2, 4, 4))
plot(test, type = "o")


# How does the same data looks in PCA axes?

o.pca <- prcomp(test)
plot(scores(o.pca), asp = 1, type = "o")

# since we had originally 2 dimensions then we have the same shape but PC1 is describing the most variable direction.

summary(o.pca)
## Importance of components:
##                           PC1    PC2
## Standard deviation     1.1713 0.5727
## Proportion of Variance 0.8071 0.1929
## Cumulative Proportion  0.8071 1.0000

# how much of total variation is described by the principal components.


# Now, using real community data
o.pca <- prcomp(comm.data)
summary(o.pca) # How much our principal components describe
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     1.3064 0.8512 0.78620 0.72071 0.62673 0.57314 0.54010
## Proportion of Variance 0.2555 0.1085 0.09254 0.07777 0.05881 0.04918 0.04367
## Cumulative Proportion  0.2555 0.3640 0.45655 0.53431 0.59312 0.64230 0.68597
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.52109 0.49162 0.45124 0.41312 0.39359 0.37787 0.35786
## Proportion of Variance 0.04065 0.03618 0.03048 0.02555 0.02319 0.02138 0.01917
## Cumulative Proportion  0.72662 0.76281 0.79329 0.81885 0.84204 0.86341 0.88259
##                           PC15    PC16    PC17    PC18    PC19    PC20    PC21
## Standard deviation     0.35043 0.31108 0.30573 0.27529 0.27072 0.23740 0.21337
## Proportion of Variance 0.01838 0.01449 0.01399 0.01135 0.01097 0.00844 0.00682
## Cumulative Proportion  0.90097 0.91546 0.92945 0.94080 0.95177 0.96021 0.96703
##                           PC22   PC23    PC24    PC25    PC26   PC27    PC28
## Standard deviation     0.21089 0.1917 0.18782 0.16537 0.15206 0.1484 0.14236
## Proportion of Variance 0.00666 0.0055 0.00528 0.00409 0.00346 0.0033 0.00303
## Cumulative Proportion  0.97368 0.9792 0.98447 0.98856 0.99202 0.9953 0.99836
##                           PC29      PC30
## Standard deviation     0.10480 5.802e-16
## Proportion of Variance 0.00164 0.000e+00
## Cumulative Proportion  1.00000 1.000e+00


pairs(scores(o.pca)[, 1:4], pch = 16, cex = 0.5) # ordination plots


# however, generally only PC1 and PC2 are used

par(mfrow = c(1, 2))
plot(scores(o.pca)[, 1:2], asp = 1, pch = o.grel) # clusters
legend("topleft", legend = unique(o.grel), pch = unique(o.grel))

# Eigenvectors of frequent species -- describing their contribution to principal components, can be visualized by arrows (vectors)
x <- o.pca$rotation[freq.spp, 1]
y <- o.pca$rotation[freq.spp, 2]
plot(x, y, type = "n", asp = 1)
abline(h = 0, v = 0, col = "gray")
arrows(0, 0, x, y, length = 0.03, col = "red")
text(x, y, colnames(comm.data)[freq.spp], cex = 0.6)



par(mfrow=c(1,1))

6.3 Principal Coordinates Analysis

Ordination based on eigenvalues and any distance matrix. If you select Euclidean distance, it is equal to PCA.

library(ape)

vegdist <- vegdist(comm.data, "euclidean")
o.pco <- pcoa(vegdist)
par(mfrow = c(1, 2))
plot(o.pco$vectors[, 1:2], asp = 1, pch = o.grel, xlab = "PCO1", ylab = "PCO2")
plot(scores(o.pca)[, 1:2], asp = 1, pch = o.grel)


## Should be identical (NB! axis direction does not have meaning)

# Now using Bray-Curtis distance

plot(o.pco$vectors[, 1:2], asp = 1, pch = o.grel, xlab = "PCO1", ylab = "PCO2")
vegdist <- vegdist(comm.data, "bray")
o.pco <- pcoa(vegdist)
plot(o.pco$vectors[, 1:2], asp = 1, pch = o.grel)



par(mfrow = c(1, 1))

6.4 Correspondence Analysis

This technique (also called Reciprocal Averaging) tries to ordinate both samples and taxa in parallel. It expects unimodal response curves of taxa and aims to find the weighted averages for taxa and sites.

o.ca <- cca(comm.data)

summary(o.ca)$cont$importance[, 1:3]
##                             CA1        CA2        CA3
## Eigenvalue            0.5392631 0.39437464 0.33141756
## Proportion Explained  0.1258140 0.09201047 0.07732213
## Cumulative Proportion 0.1258140 0.21782447 0.29514660

plot(o.ca$CA$u[, 1:2], pch = o.grel, asp = 1)


plot(o.ca) # Biplot where both sites and taxa are given (their averaged locations)


# Adding manually sites and species to obtain a more clean image
plot(o.ca, type = "n")
points(o.ca, display = "sites", cex = 0.8, pch = o.grel, col = "red")
text(o.ca, display = "spec", cex = 0.7, col = "blue", 
     select = freq.spp)


plot(o.ca, type = "n", xlim = c(-2, 3.5))
points(o.ca, display = "sites", cex = 0.8, pch = 16, col = "red")
ordipointlabel(o.ca, cex = 0.7, display = "species", col = "blue", add = T,
               select = freq.spp) ## Tries to optimize the location of the text labels to  avoid overlap


tabasco(comm.data, o.ca) # ordering tables using CA for both samples and taxa.

6.5 Nonmetric Multidimensional Scaling (NMDS)

Not based on maximum variation but on shifting iteratively of objects within a low number of axes so that the distance between samples is maximally kept. The algorithm starts from a random order or PCA. Compares the difference between real distance and the distance within the ordination space (this difference is called “stress”). Often used nowadays because computing is not limiting any more. A rule of thumb: stress ca 0.05 means an excellent representation in reduced dimensions, 0.1 is great, 0.2 is satisfactory, and stress >0.3 means a poor representation.

We also explore how to limit clusters and draw species richness on the ordination graph.

o.mds <- metaMDS(comm.data, distance = "euclidean", k = 3)
## Run 0 stress 0.1125173 
## Run 1 stress 0.1122945 
## ... New best solution
## ... Procrustes: rmse 0.02116059  max resid 0.08415238 
## Run 2 stress 0.1122949 
## ... Procrustes: rmse 0.0006232159  max resid 0.002437037 
## ... Similar to previous best
## Run 3 stress 0.1190965 
## Run 4 stress 0.1125174 
## ... Procrustes: rmse 0.02149199  max resid 0.08495809 
## Run 5 stress 0.1122949 
## ... Procrustes: rmse 0.0005283557  max resid 0.001997025 
## ... Similar to previous best
## Run 6 stress 0.1125172 
## ... Procrustes: rmse 0.02140855  max resid 0.08473567 
## Run 7 stress 0.1122947 
## ... Procrustes: rmse 0.0004807706  max resid 0.001879513 
## ... Similar to previous best
## Run 8 stress 0.1194154 
## Run 9 stress 0.1125184 
## ... Procrustes: rmse 0.02184386  max resid 0.08587465 
## Run 10 stress 0.1125177 
## ... Procrustes: rmse 0.0216377  max resid 0.08533981 
## Run 11 stress 0.1122945 
## ... New best solution
## ... Procrustes: rmse 0.0001553032  max resid 0.0006005276 
## ... Similar to previous best
## Run 12 stress 0.1184554 
## Run 13 stress 0.1122945 
## ... Procrustes: rmse 5.370686e-05  max resid 0.0001680833 
## ... Similar to previous best
## Run 14 stress 0.112517 
## ... Procrustes: rmse 0.02118157  max resid 0.08425624 
## Run 15 stress 0.1122944 
## ... New best solution
## ... Procrustes: rmse 0.0001041504  max resid 0.0003010347 
## ... Similar to previous best
## Run 16 stress 0.1125406 
## ... Procrustes: rmse 0.01342035  max resid 0.05806213 
## Run 17 stress 0.1125171 
## ... Procrustes: rmse 0.02122666  max resid 0.08435193 
## Run 18 stress 0.1194156 
## Run 19 stress 0.1190995 
## Run 20 stress 0.1194153 
## *** Best solution repeated 1 times
stressplot(o.mds, vegdist) # stress is the distance of points from the line! Looks fine!


plot(o.mds) # biplot, samples taxa


plot(o.mds$points, pch = o.grel) # only samples

ordihull(o.mds, o.grel, col = 1:5) # connecting clusters


## Adding species diversity to plot
plot(o.mds$points, pch = o.grel, col = o.grel)
ordisurf(o.mds, diversity(comm.data), col = "grey", main = "Shannon diversity",
         add = T)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(x1, x2, k = 10, bs = "tp", fx = FALSE)
## 
## Estimated degrees of freedom:
## 5.1  total = 6.1 
## 
## REML score: 15.15086
ordihull(o.mds, o.grel, col = 1:5)
legend("bottomleft", legend = unique(o.grel), col = unique(o.grel),
       pch = unique(o.grel), lwd = 1)

> Try also ordispider and ordiellipse functions

Answer

plot(o.mds$points, pch = o.grel, col = o.grel)
ordispider(o.mds, o.grel, col = unique(o.grel))
ordiellipse(o.mds, o.grel, col = unique(o.grel),
            kind="se", lwd=2, con = 0.95)
## Warning in chol.default(cov, pivot = TRUE): the matrix is either rank-deficient
## or not positive definite

## Warning in chol.default(cov, pivot = TRUE): the matrix is either rank-deficient
## or not positive definite

6.6 3D visualizations

library(vegan3d)

ordiplot3d(o.mds, type = "h", pch = o.grel, 
           , col = o.grel)

ordirgl(o.mds, col = o.grel, pch = o.grel) # Should open a new window!


save(o.mds, file = "ordi.rda")

LTOM.01.002 Data Analysis in Community Ecology

Carlos P. Carmona (carlos.perez.carmona@ut.ee)

Aurele Toussaint (aurele.toussaint@ut.ee)

Meelis Pärtel (meelis.partel@ut.ee)

Diego P. F. Trindade (diego.trindade@ut.ee)

Slendy Rodriguez-Alarcon (slendy.rodriguez@ut.ee)

Autumn 2023