## 23 May 2011

### Summarize Data by Several Variables

Here's an example how to conveniently summarize data with the cast function (package reshape). By the way you see how this could be done "in-conveniently" by hand. You also see how a for-loop works and how a matrix is constructed and filled. In addition this serves as an illustrative example how flexible "indexing" in R works, as seen in the below loop! (download data) (this example is adapted from https://stat.ethz.ch/pipermail/r-sig-ecology/2011-May/002174.html)

```# Set path were you downloaded data to:

# see what's in there:
ls()

# investigate data3:
str(data3)

# i want to know how many taxa are within each ecoregion -
# more presicly i want to know how many orders, families, genera are there within each region:

require(reshape)

dfm<- melt(data3, id = "ECO_NAME")
dfc<- cast(dfm, ECO_NAME~variable, function(x) length(unique(x)))

# the same by hand -
# make new variable of ECO_NAME*variable combinations:
dfm\$variable2 <- as.factor(paste(dfm\$variable, dfm\$ECO_NAME, sep = " - "))

# make vector to collect results for all unique ECO_NAME*variable combinations:
Ns <- data.frame(count = rep(NA, length(unique(dfm\$variable2))),
row.names = unique(dfm\$variable2))

# loop through all unique ECO_NAME*variable combinations
# and record length of unique values:
for (i in levels(dfm\$variable2)){
subset = dfm\$value[dfm\$variable2 == i]
Ns[i, "count"] <- length(unique(subset))
}

# put counts in matrix/table, Ns\$count is in order of taxonomy
# levels (= "variable"), so I have to fill by cols (byrow = F),
# as the matrix/table colums are chosen to be the taxonomy levels:
result <- matrix(Ns\$count,
nrow = length(levels(dfm\$ECO_NAME)),
ncol = length(levels(dfm\$variable)),
dimnames = list(levels(dfm\$ECO_NAME), levels(dfm\$variable)), byrow = F)
print(result)
```
To cite reshape in publications, please use:
H. Wickham. Reshaping data with the reshape package. Journal of
Statistical Software, 21(12), 2007.