28 Mar 2012

Applying Same Changes to Multiple Dataframes

How to apply the same changes to several dataframes and
save them to CSV:

# a dataframe
a <- data.frame(x = 1:3, y = 4:6)

# make a list of several dataframes, then apply function (change column names, e.g.):
my.list <- list(a, a)
my.list <- lapply(my.list, function(x) {names(x) <- c("a", "b") ; return(x)})

# save dfs to csv with similar lapply-call:
n <- 1:length(my.list)
lapply(n, function(ni) {
               write.table(file = paste(ni, ".csv", sep = ""), 
               my.list[ni], sep = ";", row.names = F)


I'll extend this to a script that reads several files from a directory, applies changes to the files in the same fashion and finally saves files back to the directory (as HERE)

# clean up
rm(list = ls())

# create some files in tempdir:
a <- data.frame(x = 1:3, y = 4:6)
b <- data.frame(x = 10:13, y = 14:15)
write.csv(a, "file1.csv", row.names = F)
write.csv(b, "file2.csv", row.names = F)

# now read all files to list:
mycsv = dir(pattern=".csv")

n <- length(mycsv)
mylist <- vector("list", n)

for(i in 1:n) mylist[[i]] <- read.csv(mycsv[i])

# now change something in all dfs in list:
mylist <- lapply(mylist, function(x) {names(x) <- c("a", "b") ; return(x)})

# then save back dfs:
for(i in 1:n) 
   write.csv(file = paste("file", i, ".csv", sep = ""),
   mylist[i], row.names = F)


  1. The second example is much more complex than the same code without lapply (and the funny construction of the file path). You can write it more simply as
    for (ni in 1:length(my.list))
    write.table(file = paste(ni, ".csv", sep = ""),
    my.list[ni], sep = ";", row.names = F)

    1. You're right - the c(1:length(my.list))[ni] is stupid! (I've changed that)
      When it comes to lapply, I suppose it will be faster than a for loop (?).

    2. Even if lapply() is faster, you will never notice any difference when the body of the loop is writing a file.