theBioBucket*: Multivariate Repeated Measurements with adonis():

18 Apr 2011

Multivariate Repeated Measurements with adonis():

Please check the updated code in the comment by Wallace Beiroz, 26 January 2015 at 13:37!

Lately I had to figure out how to do a repeated measures (or mixed effects) analysis on multivariate (species) data. Here I share code for a computation in R with the adonis function of the vegan package. Credit goes to Gavin Simpson providing most of the important pieces of the below code in R-Help.

The design:

We have multivariate species data, sampled at different sites (n = 6) at 3 points in time (N = 24).

The hypothesis:

H0: Species composition is the same across time. The permutation will produce the H0-population by a restricted randomization of time points.

H1: Species composition differs between time points

The analysis:

With repeated measurements one has to allow for the temporal structure and for the dependency within repeated measures. In a pedantic manner one would account for temporal structure by permuting all time-points in the same way also keeping the ordering of time-points. With 3 time-points this would give you only 3 permutations!

But we may relax this very restrictive assumption and say it doesn't matter that all samples at t1 were taken at one time and the ones of t2, etc. at another. Still there would be the repeated measures and the fact that the samples were taken in a temporal sequence. So how to deal with this?

(1) Having in mind that repeated measures on one sample are not independent, we will not permute across these samples.

(2) Now to the temporal correlation within repeated samples: The ordering of time-points is another type of dependency: the ordering is 1, 2, 3 - we will change this ordering only to 3, 1, 2 or 2, 3, 1, This permutation is referred to as toroidal shift which is keeping the "neighborhood" of values intact.

Eventually, we have 3 permutations per site, with 6 sites - this yields 3^6 = 729 permutations overall. Thus, a minimal p-value of 1/729 = 0.0014 can be obtained. I'd say that's good enough!

### species matrix with 20 species abundances (mean = 50, sd = 10)
### one time variable, with 3 timepoints, which should be tested
### and a factor denoting sites that were repeatedly sampled (site)
 
## Load packages
require(vegan)
 
### Data:
sp <- matrix(rnorm(3 * 6 * 20, 50, 10), nrow = 3 * 6, ncol = 20,
         dimnames = list(1:18, paste("Sp", 1:20, sep = "")))
 
time <- as.ordered(rep(1:3, 6))
site <- gl(6, 3)
cbind(site, time, sp)
 
### add time effect at timepoint 3,
### this will effect will be tested by adonis():
sp_1 <- sp
sp_1[time==3,] <-  sp[time==3,] + rnorm(20, 10, 1)
cbind(site, time, sp_1)
 
### choose which species set to test:
test_sp <- sp_1
 
### computing the true R2-value
 
### (btw, using dist() defaults to euclidean distance):
print(fit <- adonis(dist(test_sp) ~ time, permutations=1))
 
### number of perms
B <- 1999
 
### setting up frame which will be populated by
### random r2 values:
pop <- rep(NA, B + 1)
 
### the first entry will be the true r2:
pop[1] <- fit$aov.tab[1, 5]
 
### set up a "permControl" object:
### we turn off mirroring as time should only flow in one direction
ctrl <- permControl(strata = site, within = Within(type = "series", mirror = FALSE))
 
### Number of observations:
nobs <- nrow(test_sp)
 
### check permutation (...rows represent the sample id):
### ..they are ok!
### within in each repeated sample (= sites) timepoints are shuffled,
### with keeping the sequence intact (e.g., for  site 1: 1,2,3 - 2,3,1 - 3,2,1)
shuffle(nobs, control = ctrl)
 
### loop:
### in adonis(...) you need to put permutations = 1, otherwise 
### adonis will not run
set.seed(123)
for(i in 2:(B+1)){
     idx <- shuffle(nobs, control = ctrl)
     fit.rand <- adonis(dist(test_sp) ~ time[idx], permutations = 1)
     pop[i] <- fit.rand$aov.tab[1, 5]
}
 
### get the p-value:
print(pval <- sum(pop >= pop[1]) / (B + 1))
### [1] 0.0035
 
### the sign. p-value supports the H1 (->there is a time effect).
### ..and the fact that samples are not iid is allowed by
### the customized perms - so this p-value is trustworthy as opposed
### to tests not acknowledging dependency of data points..
 
### test sp set without an effect:
### replace test_sp with sp set without effect:
test_sp <- sp
 
### now re-run the script and see the result:
### it is insign. - as expected:
 
### setting up frame which will be populated by
### random r2 values:
pop <- rep(NA, B + 1)
 
### computing the true R2-value:
print(fit <- adonis(dist(test_sp) ~ time, permutations = 1))
 
### the first entry will be the true r2:
pop[1] <- fit$aov.tab[1, 5]
 
### run the loop:
set.seed(123)
for(i in 2:(B+1)){
     idx <- shuffle(nobs, control = ctrl)
     fit.rand <- adonis(dist(test_sp) ~ time[idx], permutations = 1)
     pop[i] <- fit.rand$aov.tab[1, 5]
}
print(pval <- sum(pop >= pop[1]) / (B + 1))
### [1] 0.701
 
## make a histogram to see random R2-values and the true one:
hist(pop, xlab = "Population R2")
abline(v = pop[1], col = 2, lty = 3)
text(0.08, 300, paste("true R2,\np = ", pval, sep = ""))

To cite package ‘vegan’ in publications use:

Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre,
Peter R. Minchin, R. B. O'Hara, Gavin L. Simpson, Peter Solymos, M.
Henry H. Stevens and Helene Wagner (2011). vegan: Community Ecology
Package. R package version 2.0-2.
http://CRAN.R-project.org/package=vegan

http://cran.r-project.org/web/packages/vegan/index.html

http://cc.oulu.fi/~jarioksa/softhelp/vegan.html

39 comments :

Anonymous18 October 2011 at 15:55
Hi there. I'm interested in this approach, but am having a hard time finding information on it. It seems that there are a number of papers out there that just use the regular PERMANOVA for data sets with repeated sampling. Are these just poor analyses?

Also, it would be great if you could provide the output that you see in the text so that readers can confirm that what they are doing is coming out right.
ReplyDelete
Replies
Anonymous26 October 2011 at 12:15
If you have 729 unique permutations, why did you use 1999 permutations? Why not all possible 729?
ReplyDelete
Replies
Ronei Baldissera27 March 2012 at 17:32
Hi, I need to perform a repeated measures permanova based on a bi-factorial sampling design.

Can you help me?

Cheers.

Ronei.
ReplyDelete
Replies
Susanna Remold30 August 2012 at 13:15
Hi- I am analyzing genetic distance matrices from samples taken from a three-factor sampling design, with repeated samplings over time. I have horrible problems with imbalance, as I had no way to control whether or not I actually got my (relatively rare) target organism at each sampling. Because I have had a hard time finding information about how to deal with violations of the assumptions of balance I am slashing back my data set to only the combinations that I can get balance for.

Do I need balance in the number of instances of samplings per site as well, or have I some wiggle room in the repeated measures portion of the analysis?

Thanks for your input!
ReplyDelete
Replies
FC6 November 2012 at 11:11
Hi Kay,

This is a very cool. I´m have the p value, but how can we retrieve the F and degrees of freedom.

Sorry if it is a simple question, but my knowledge on statistic is very basic
ReplyDelete
Replies
FC6 November 2012 at 23:49
Thank´s Kay

FC
ReplyDelete
Replies
Anonymous26 December 2012 at 10:15
Hi Kay,

Sorry if this is a messy question for you. I have a species matrix with 11 species abundances, one time variable (weeks) with 6 timepoints, sites that were repeatedly sampled (site), replicate 3 per site. My null hypothesis is that the site didn't influence the species abundances

I have some basic background in statistics, but I am quite new to R. I have tried unsuccessfully
to modify your code to work with my data. Could you suggest me on how to proceed?

Best regards

Fernando

Example of my matrix:
Site Replicate week Sp1 Sp2 Sp3 Sp4 Sp5 Sp6 Sp7 Sp8 Sp9 Sp10 Sp11
SS a 1 0 9 0 0 0 0 0 0 0 0 9
SS b 1 0 6 0 0 1 0 0 0 1 0 6
SS c 1 0 11 2 0 0 0 0 0 0 0 13
SS d 1 0 9 1 0 0 0 0 0 0 0 10
Mh a 1 0 3 0 0 2 1 0 0 0 0 3
Mh b 1 0 3 1 0 0 0 0 0 0 0 4
........
ReplyDelete
Replies
Anonymous25 February 2013 at 05:07
Hi Kay,

thanks a lot for the script as it does help to reduce the amount of analyses to run and is just what I was looking for. However, I have one question: In lines 18 to 20 you have the following code:

sp_1 <- sp
sp_1[time==3,] <- sp[time==3,] + rnorm(20, 10, 1)
cbind(site, time, sp_1)

In the middle line you add further random numbers to the original data set. These random numbers are needed just in your case or are the supposed to be added in my adapted script as well? What is the reason behind adding these random numbers?
I'm sorry if it sounds like s silly question, am just trying to understand what I am actually doing here.

Kind regards,
Suse
ReplyDelete
Replies
Unknown6 March 2013 at 00:46
Dear KAY,
If I run a MANOVA can I get the significance of each variable in the left part of the model by applying a summary.aov function.
Did you know whether it is possible to obtain this output in a PERMANOVA using the adonis function.
Thanks
ReplyDelete
Replies
Kay6 August 2013 at 22:05
Hi e,

'sp' has no time effect, whereas for 'sp_1' some effect is added at time-level '3'. I run the test twice, in both cases using 'test_sp' - for the first test 'sp' is assigned to 'test_sp' and in the second test it is 'sp_1' that is assigned to 'test_sp'..

HTH, Kay
ReplyDelete
Replies
@rne21 January 2014 at 14:16
I'm not sure I totally get this. Stupid me.

First, easy question:
why would you use euclidian distances on species abundance data? This is just for demonstration, right?

Second: If I'm not only looking at the effect of time, but also at the effect of a treatment (in split-plot design, e.g.), how would I need to control for it?

This would make no sense, I guess:

fit <- adonis(distance ~ treatment+time, permutations=1, strata=treatment)
ReplyDelete
Replies
botanisch22 January 2014 at 17:26
Kay, I still struggle. Using a real-life dataset - 24 plots (actually, 12 times two treatments, but that's another complication), sampled at four times, would I have to use 4^24 restricted permutations to check for a sampling time effect?

This does not seem right. Nor is it feasible.

ReplyDelete
Replies
Anonymous4 July 2014 at 05:38
Hi kay
Thanks for taking the time to figure this out... I applied this approach to my own data (and I dare say it worked quite well!) and I would like to include this as part of the analysis in a paper I am writing... any suggestions on how to refer to this approach and cite it (apart from the vegan citation, of course)? I am also afraid many reviewers would be sceptical about this, but I really like this approach and the results I got out of this do seem to make sense with the rest of my analysis, so it would be a pity not to include it... any pointers would be appreciated! Thanks again, Laura
ReplyDelete
Replies
Anonymous9 July 2014 at 02:50
Hi kay
thanks again! One extra thing I noticed only just now: while going over my method again, I tried using the option "strata" within adonis, and it seems to be doing something very similar to your script (except the permutation is done over the F values rather than the R2 values). Am I wrong? And if that is indeed the case that both method constrain the permutations in the "correct" way , would using strata work as well, you reckon? Cheers again, Laura
ReplyDelete
Replies
Wallace Beiroz26 January 2015 at 13:37
Hi Kay!

First of all, thank you for your script it was exactly what I was looking for. However, I realized that some of your commands are obsolete. Thus, I did some small changes on your script, in order to update it.
I hope that you appreciate that and it can be helpful for other people!

As I am not a R language expert I might make some mistakes. Please see below my changes.

Thank you again!

### species matrix with 20 species abundances (mean = 50, sd = 10)
### one time variable, with 3 timepoints, which should be tested
### and a factor denoting sites that were repeatedly sampled (site)

## Load packages
require(vegan)

### Data:
sp <- matrix(rnorm(3 * 6 * 20, 50, 10), nrow = 3 * 6, ncol = 20,
dimnames = list(1:18, paste("Sp", 1:20, sep = "")))

time <- as.ordered(rep(1:3, 6))
site <- gl(6, 3)
cbind(site, time, sp)

### add time effect at timepoint 3,
### this will effect will be tested by adonis():
sp_1 <- sp
sp_1[time==3,] <- sp[time==3,] + rnorm(20, 10, 1)
cbind(site, time, sp_1)

### choose which species set to test:
test_sp <- sp_1

### computing the true R2-value

### (btw, using dist() defaults to euclidean distance):
print(fit <- adonis(test_sp ~ time, permutations=1)) # "NO MORE dist() is needed"

### number of perms
B <- 1999

### setting up frame which will be populated by
### random r2 values:
pop <- rep(NA, B + 1)

### the first entry will be the true r2:
pop[1] <- fit$aov.tab[1, 5]

### set up a "permControl" object:
### we turn off mirroring as time should only flow in one direction
ctrl <- how(blocks = site, within = Within(type = "series", mirror = FALSE)) #"permControl" is not used at all

### Number of observations:
nobs <- nrow(test_sp)

### check permutation (...rows represent the sample id):
### ..they are ok!
### within in each repeated sample (= sites) timepoints are shuffled,
### with keeping the sequence intact (e.g., for site 1: 1,2,3 - 2,3,1 - 3,2,1)
shuffle(nobs, control = ctrl)

### loop:
### in adonis(...) you need to put permutations = 1, otherwise
### adonis will not run
set.seed(123)
for(i in 2:(B+1)){
idx <- shuffle(nobs, control = ctrl)
fit.rand <- adonis(test_sp ~ time[idx],permutations = 1) # "NO MORE dist() is needed"
pop[i] <- fit.rand$aov.tab[1, 5]
}

### get the p-value:
print(pval <- sum(pop >= pop[1]) / (B + 1))
### [1] 0.0105

### the sign. p-value supports the H1 (->there is a time effect).
### ..and the fact that samples are not iid is allowed by
### the customized perms - so this p-value is trustworthy as opposed
### to tests not acknowledging dependency of data points..

### test sp set without an effect:
### replace test_sp with sp set without effect:
test_sp <- sp

### now re-run the script and see the result:
### it is insign. - as expected:

### setting up frame which will be populated by
### random r2 values:
pop <- rep(NA, B + 1)

### computing the true R2-value:
print(fit <- adonis(test_sp ~ time, permutations = 1))# "NO MORE dist() is needed"

### the first entry will be the true r2:
pop[1] <- fit$aov.tab[1, 5]

### run the loop:
set.seed(123)
for(i in 2:(B+1)){
idx <- shuffle(nobs, control = ctrl)
fit.rand <- adonis(test_sp ~ time[idx], permutations = 1)# "NO MORE dist() is needed"
pop[i] <- fit.rand$aov.tab[1, 5]
}
print(pval <- sum(pop >= pop[1]) / (B + 1))
### [1] 0.7605

## make a histogram to see random R2-values and the true one:
hist(pop, xlab = "Population R2")
abline(v = pop[1], col = 2, lty = 3)
text(0.08, 300, paste("true R2,\np = ", pval, sep = ""))
ReplyDelete
Replies
Aks11 March 2015 at 07:22
Hello, Kay!
And thank you for the script!

Can you, please, give me an advice on what I should change if I only have 2 time measurements for n patients and I want to compare whether those measurements differ or not?
Or maybe I even can just use strata option in my case?
ReplyDelete
Replies
Unknown27 March 2015 at 00:06
Hi there!
correct me if I'm wrong but you would get the same p-value by giving the output of how() (i.e. ctrl) as an input argument to adonis():
print(fit <- adonis(test_sp ~ time, permutations=ctrl))
Right?
Nic
ReplyDelete
Replies
Jan11 August 2017 at 13:52
or rather, something like:

proteins ~ time[idx] + outcomes

although then strata will not be taken into account for outcomes...
ReplyDelete
Replies

Add comment

Subscribe to: Post Comments ( Atom )

TABS

18 Apr 2011

Multivariate Repeated Measurements with adonis():

39 comments :