3

I am trying to write a loop in R that will select the same specific column names in multiple data frames. The below code seems to achieve the desired output but the dataframes are not re-assigned. How can I re-assign the dataframes df1 and df2 to the lapply output?

 col1 <- c(1,2,3,4)
 col2 <- c("A","B","C","D")
 col3 <- c(4,15,"BLANK","ZZ")

 df1 <- data.frame(col1,col2, col3)

 col1 <- c(500,546,47,87)
 col2 <- c("E","L","J","U")
 col3 <- c(6,10,"F","R")

 df2 <- data.frame(col1,col2, col3)

 df_list <- list(df1,df2)

 lapply(df_list,function(x) {x<- x %>% select("col1","col2")} )

1 Answer 1

1

We can use a named list and then with list2env to update the objects in the global env (but, it is better to keep it in a list)

list2env(setNames(lapply(df_list, `[`, c("col1", "col2")),
         c("df1", "df2")), .GlobalEnv)
df1
#  col1 col2
#1    1    A
#2    2    B
#3    3    C
#4    4    D
df2
#  col1 col2
#1  500    E
#2  546    L
#3   47    J
#4   87    U

Instead of creating the 'df_list' by specifying list(df1, df2), we can have a named list by making use of mget

df_list <- mget(ls(pattern= "^df\\d+$"))

then, it is more easier

list2env(lapply(df_list, `[`, c("col1", "col2")), .GlobalEnv)
Sign up to request clarification or add additional context in comments.

5 Comments

Wow that is fantastic! Very efficient with using mget.
when I apply this solution in AWS I get this error. any thoughts as to why?: Error in [.data.table(X[[i]], ...) : When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
@MattGossett In the code, we are not doing any join though. It could be because you have a data.table object while the input showed here is data.frame?
You may need lapply(df_list, function(x) x[, .(col1, col2)])
thank you! ended up using this to remove a column and it worked lapply(df_list, function(x) subset(x, select=-c(col1)))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.