Using vector for recoding variables in a dataframe

Question

In a recent project, I have quite a big data frame. And I'd like to reprogram certain variables using a vector that I defined earlier.

I know there are many other ways to recode the data, but I was wondering if I could use the vector because it seems like an elegant solution.

df <- data.frame(
  A = c(1,2,2,1),
  B = c(1,1,1,2),
  C = c(2,2,1,2)
)


vector <- c(
  "A",
  "B"
)

Consider this example. Here I have created a vector, which consists of 2 Names in the Data set. Can I now use this vector to reprogram the data frame? E.g. I'd like to change all '1' to a '0' in the columns 'A' and 'B'.

I tried this:

df[df[,vector]==1] <- 0

Yet this code only works, when i define the Vector like this:

vector <- c(
  "A",
  "B",
  "C"
)

Therefore, when it includes all the variables in the data frame.

If I use the same code, but the vector does only include 'A' and 'B', i get the following error:

Error in `[<-.data.frame`(`*tmp*`, df[, vector] == 2, value = 1) : 
  unsupported matrix index in replacement

Do you have an Idea on how this might work?

Kind regards

df[, vector] <- replace(df[, vector], df[, vector] == 1, 0) — Maël
– Maël, Commented Feb 27, 2023 at 13:15
That worked, thanks! Do you think it is also possible to use a vector to change the class of those columns? like so: ds[,varnames]<- as.numeric(ds[,varnames]) That didn't work for me though... — Linus
– Linus, Commented Feb 27, 2023 at 14:03
Nevermind, I figured it out: ds[,varnames] <- sapply(ds[,varnames],as.numeric) — Linus
– Linus, Commented Feb 27, 2023 at 14:20
Have a look at Replace all NA with FALSE in selected columns in R — GKi
– GKi, Commented Feb 28, 2023 at 8:18

langtang · Accepted Answer · 2023-02-27 13:32:21Z

1

You can use mutate(across()) from dplyr.

mutate(df,across(all_of(vector),\(v) replace(v,v==1,0)))

answered Feb 27, 2023 at 13:32

langtang

25.3k1 gold badge14 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

GKi · Accepted Answer · 2023-02-27 15:52:04Z

A base way could be to subset df with vector and then subset this where df[vector]==1.

df[,vector][df[,vector]==1] <- 0
#df[vector][df[vector]==1] <- 0 #Alternative

df
#  A B C
#1 0 0 2
#2 2 0 2
#3 2 0 1
#4 0 2 2

Another way could be to use a for loop.

for(i in vector) df[[i]][df[[i]]==1] <- 0
#for(i in vector) df[,i][df[,i]==1] <- 0 #Variant

Benchmark

bench::mark(check=FALSE,
langtang = local({df <- dplyr::mutate(df,dplyr::across(all_of(vector),\(v) replace(v,v==1,0)))}),
"Maël" = local({df[, vector] <- replace(df[, vector], df[, vector] == 1, 0)}),
GKi = local({df[,vector][df[,vector]==1] <- 0}),
GKi2 = local(for(i in vector) df[,i][df[,i]==1] <- 0),
GKi3 = local(for(i in vector) df[[i]][df[[i]]==1] <- 0)
)
#  expression      min median itr/s…¹ mem_al…² gc/se…³ n_itr  n_gc total…⁴ result
#  <bch:expr> <bch:tm> <bch:>   <dbl> <bch:by>   <dbl> <int> <dbl> <bch:t> <list>
#1 langtang     2.66ms    3ms    299.   7.89KB    8.37   143     4   478ms <NULL>
#2 Maël       219.56µs  241µs   4017.     280B   12.3   1955     6   487ms <NULL>
#3 GKi        222.48µs  243µs   4013.     280B   12.3   1951     6   486ms <NULL>
#4 GKi2       106.96µs  116µs   8452.     280B   12.3   4119     6   487ms <NULL>
#5 GKi3        60.75µs   65µs  15217.     280B   14.4   7398     7   486ms <NULL>

The for loop is about 3 times faster than the other base variants and about 50 times faster than the dplyr variant. All base variants use less memory compared to the dplyr variant.

Collectives™ on Stack Overflow

Using vector for recoding variables in a dataframe

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related