18 Apr 2011

Multivariate Repeated Measurements with adonis():

Lately I had to figure out how to do a repeated measures (or mixed effects) analysis on multivariate (species) data. Here I share code for a computation in R with the adonis function of the vegan package. Credit goes to Gavin Simpson providing most of the important pieces of the below code in R-Help.

The design:
We have multivariate species data, sampled at different sites (n = 6) at 3 points in time (N = 24).

The hypothesis:
H0: Species composition is the same across time. The permutation will produce the H0-population by a restricted randomization of time points.
H1: Species composition differs between time points

The analysis:
With repeated measurements one has to allow for the temporal structure and for the dependency within repeated measures. In a pedantic manner one would account for temporal structure by permuting all time-points in the same way also keeping the ordering of time-points. With 3 time-points this would give you only 3 permutations!

But we may relax this very restrictive assumption and say it doesn't matter that all samples at t1 were taken at one time and the ones of t2, etc. at another. Still there would be the repeated measures and the fact that the samples were taken in a temporal sequence. So how to deal with this?

(1) Having in mind that repeated measures on one sample are not independent, we will not permute across these samples.

(2) Now to the temporal correlation within repeated samples: The ordering of time-points is another type of dependency: the ordering is 1, 2, 3 - we will change this ordering only to 3, 1, 2 or 2, 3, 1, This permutation is referred to as toroidal shift which is keeping the "neighborhood" of values intact.

Eventually, we have 3 permutations per site, with 6 sites - this yields 3^6 = 729 permutations overall. Thus, a minimal p-value of 1/729 = 0.0014 can be obtained. I'd say that's good enough!

### species matrix with 20 species abundances (mean = 50, sd = 10)
### one time variable, with 3 timepoints, which should be tested
### and a factor denoting sites that were repeatedly sampled (site)
 
## Load packages
require(vegan)
 
### Data:
sp <- matrix(rnorm(3 * 6 * 20, 50, 10), nrow = 3 * 6, ncol = 20,
         dimnames = list(1:18, paste("Sp", 1:20, sep = "")))
 
time <- as.ordered(rep(1:3, 6))
site <- gl(6, 3)
cbind(site, time, sp)
 
### add time effect at timepoint 3,
### this will effect will be tested by adonis():
sp_1 <- sp
sp_1[time==3,] <-  sp[time==3,] + rnorm(20, 10, 1)
cbind(site, time, sp_1)
 
### choose which species set to test:
test_sp <- sp_1
 
### computing the true R2-value
 
### (btw, using dist() defaults to euclidean distance):
print(fit <- adonis(dist(test_sp) ~ time, permutations=1))
 
### number of perms
B <- 1999
 
### setting up frame which will be populated by
### random r2 values:
pop <- rep(NA, B + 1)
 
### the first entry will be the true r2:
pop[1] <- fit$aov.tab[1, 5]
 
### set up a "permControl" object:
### we turn off mirroring as time should only flow in one direction
ctrl <- permControl(strata = site, within = Within(type = "series", mirror = FALSE))
 
### Number of observations:
nobs <- nrow(test_sp)
 
### check permutation (...rows represent the sample id):
### ..they are ok!
### within in each repeated sample (= sites) timepoints are shuffled,
### with keeping the sequence intact (e.g., for  site 1: 1,2,3 - 2,3,1 - 3,2,1)
shuffle(nobs, control = ctrl)
 
### loop:
### in adonis(...) you need to put permutations = 1, otherwise 
### adonis will not run
set.seed(123)
for(i in 2:(B+1)){
     idx <- shuffle(nobs, control = ctrl)
     fit.rand <- adonis(dist(test_sp) ~ time[idx], permutations = 1)
     pop[i] <- fit.rand$aov.tab[1, 5]
}
 
### get the p-value:
print(pval <- sum(pop >= pop[1]) / (B + 1))
### [1] 0.0035
 
### the sign. p-value supports the H1 (->there is a time effect).
### ..and the fact that samples are not iid is allowed by
### the customized perms - so this p-value is trustworthy as opposed
### to tests not acknowledging dependency of data points..
 
### test sp set without an effect:
### replace test_sp with sp set without effect:
test_sp <- sp
 
### now re-run the script and see the result:
### it is insign. - as expected:
 
### setting up frame which will be populated by
### random r2 values:
pop <- rep(NA, B + 1)
 
### computing the true R2-value:
print(fit <- adonis(dist(test_sp) ~ time, permutations = 1))
 
### the first entry will be the true r2:
pop[1] <- fit$aov.tab[1, 5]
 
### run the loop:
set.seed(123)
for(i in 2:(B+1)){
     idx <- shuffle(nobs, control = ctrl)
     fit.rand <- adonis(dist(test_sp) ~ time[idx], permutations = 1)
     pop[i] <- fit.rand$aov.tab[1, 5]
}
print(pval <- sum(pop >= pop[1]) / (B + 1))
### [1] 0.701
 
## make a histogram to see random R2-values and the true one:
hist(pop, xlab = "Population R2")
abline(v = pop[1], col = 2, lty = 3)
text(0.08, 300, paste("true R2,\np = ", pval, sep = ""))

To cite package ‘vegan’ in publications use:

Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre,
Peter R. Minchin, R. B. O'Hara, Gavin L. Simpson, Peter Solymos, M.
Henry H. Stevens and Helene Wagner (2011). vegan: Community Ecology
Package. R package version 2.0-2.
http://CRAN.R-project.org/package=vegan

19 comments:

  1. Hi there. I'm interested in this approach, but am having a hard time finding information on it. It seems that there are a number of papers out there that just use the regular PERMANOVA for data sets with repeated sampling. Are these just poor analyses?

    Also, it would be great if you could provide the output that you see in the text so that readers can confirm that what they are doing is coming out right.

    ReplyDelete
    Replies
    1. hi anonymous,

      not allowing for spatial/temporal correlation will yield anti-conservative p-values and there is higher chance of false significant effects than suggest by this p-values. i'd say that *is* poor analysis.

      i did add some output as proposed by you.

      hth, kay

      Delete
  2. If you have 729 unique permutations, why did you use 1999 permutations? Why not all possible 729?

    ReplyDelete
    Replies
    1. ..you're right: using the whole set of enumartions, what would be referred to as "exact test", would be best. But, with as much as 1999 permutation you yield a good approximation of the null-distribution. And, for the case being considered, I suppose that is all ok.

      Delete
  3. Hi, I need to perform a repeated measures permanova based on a bi-factorial sampling design.

    Can you help me?

    Cheers.

    Ronei.

    ReplyDelete
    Replies
    1. depends.. you can mail to the address in the footer.

      Delete
  4. Susanna Remold30 August 2012 13:15

    Hi- I am analyzing genetic distance matrices from samples taken from a three-factor sampling design, with repeated samplings over time. I have horrible problems with imbalance, as I had no way to control whether or not I actually got my (relatively rare) target organism at each sampling. Because I have had a hard time finding information about how to deal with violations of the assumptions of balance I am slashing back my data set to only the combinations that I can get balance for.

    Do I need balance in the number of instances of samplings per site as well, or have I some wiggle room in the repeated measures portion of the analysis?

    Thanks for your input!

    ReplyDelete
    Replies
    1. I'd say as long as your cell sizes (also for random factors, like your sites) exceed a minimum of about 5-10 (?) samples the test applies, given that you're testing the appropiate hypotheses..

      I'd advise you to address the R-mailing list (r-sig-eco) for a more profound answer on that question. See i.e. this: https://stat.ethz.ch/pipermail/r-help/2008-April/159252.html

      Delete
  5. Hi Kay,

    This is a very cool. I´m have the p value, but how can we retrieve the F and degrees of freedom.


    Sorry if it is a simple question, but my knowledge on statistic is very basic

    ReplyDelete
  6. Hi Kay,

    Sorry if this is a messy question for you. I have a species matrix with 11 species abundances, one time variable (weeks) with 6 timepoints, sites that were repeatedly sampled (site), replicate 3 per site. My null hypothesis is that the site didn't influence the species abundances

    I have some basic background in statistics, but I am quite new to R. I have tried unsuccessfully
    to modify your code to work with my data. Could you suggest me on how to proceed?

    Best regards

    Fernando

    Example of my matrix:
    Site Replicate week Sp1 Sp2 Sp3 Sp4 Sp5 Sp6 Sp7 Sp8 Sp9 Sp10 Sp11
    SS a 1 0 9 0 0 0 0 0 0 0 0 9
    SS b 1 0 6 0 0 1 0 0 0 1 0 6
    SS c 1 0 11 2 0 0 0 0 0 0 0 13
    SS d 1 0 9 1 0 0 0 0 0 0 0 10
    Mh a 1 0 3 0 0 2 1 0 0 0 0 3
    Mh b 1 0 3 1 0 0 0 0 0 0 0 4
    ........

    ReplyDelete
    Replies
    1. Hi there,

      If you`re solely interested in differences between sites I'd suppose you should average abundances over the timepoints, reducing the dataset from 6*3*N(Sites) to 3*N(Sites).. BTW, what species are we talking about, and, why did you repeat over weeks in the first place?

      HTH,
      Kay

      Delete
  7. Hi Kay,

    thanks a lot for the script as it does help to reduce the amount of analyses to run and is just what I was looking for. However, I have one question: In lines 18 to 20 you have the following code:

    sp_1 <- sp
    sp_1[time==3,] <- sp[time==3,] + rnorm(20, 10, 1)
    cbind(site, time, sp_1)

    In the middle line you add further random numbers to the original data set. These random numbers are needed just in your case or are the supposed to be added in my adapted script as well? What is the reason behind adding these random numbers?
    I'm sorry if it sounds like s silly question, am just trying to understand what I am actually doing here.

    Kind regards,
    Suse

    ReplyDelete
    Replies
    1. Suse,
      as commented in the script, this is just to add an artificial effect to showcase in the test..

      Regards,
      Kay

      Delete
  8. Dear KAY,
    If I run a MANOVA can I get the significance of each variable in the left part of the model by applying a summary.aov function.
    Did you know whether it is possible to obtain this output in a PERMANOVA using the adonis function.
    Thanks

    ReplyDelete
    Replies
    1. Ok! But the output of the adonis function doesn't show the significance of each variable, solely the total effect in the multivariate space.
      Tks.

      Delete
    2. Yes, for the significance of single variables on the left side ("species") there are no tests in adonis. That is simply because it is dealing with dissimilarity matrices rather than the single variables/species.

      Delete
  9. An ANOVA-like table is default output with the adonis function..

    ReplyDelete