1

In a recent project, I have quite a big data frame. And I'd like to reprogram certain variables using a vector that I defined earlier.

I know there are many other ways to recode the data, but I was wondering if I could use the vector because it seems like an elegant solution.

df <- data.frame(
  A = c(1,2,2,1),
  B = c(1,1,1,2),
  C = c(2,2,1,2)
)


vector <- c(
  "A",
  "B"
)

Consider this example. Here I have created a vector, which consists of 2 Names in the Data set. Can I now use this vector to reprogram the data frame? E.g. I'd like to change all '1' to a '0' in the columns 'A' and 'B'.

I tried this:

df[df[,vector]==1] <- 0

Yet this code only works, when i define the Vector like this:

vector <- c(
  "A",
  "B",
  "C"
)

Therefore, when it includes all the variables in the data frame.

If I use the same code, but the vector does only include 'A' and 'B', i get the following error:

Error in `[<-.data.frame`(`*tmp*`, df[, vector] == 2, value = 1) : 
  unsupported matrix index in replacement

Do you have an Idea on how this might work?

Kind regards

4
  • 4
    df[, vector] <- replace(df[, vector], df[, vector] == 1, 0) Commented Feb 27, 2023 at 13:15
  • That worked, thanks! Do you think it is also possible to use a vector to change the class of those columns? like so: ds[,varnames]<- as.numeric(ds[,varnames]) That didn't work for me though... Commented Feb 27, 2023 at 14:03
  • 1
    Nevermind, I figured it out: ds[,varnames] <- sapply(ds[,varnames],as.numeric) Commented Feb 27, 2023 at 14:20
  • Have a look at Replace all NA with FALSE in selected columns in R Commented Feb 28, 2023 at 8:18

2 Answers 2

1

You can use mutate(across()) from dplyr.

mutate(df,across(all_of(vector),\(v) replace(v,v==1,0)))
Sign up to request clarification or add additional context in comments.

Comments

0

A base way could be to subset df with vector and then subset this where df[vector]==1.

df[,vector][df[,vector]==1] <- 0
#df[vector][df[vector]==1] <- 0 #Alternative

df
#  A B C
#1 0 0 2
#2 2 0 2
#3 2 0 1
#4 0 2 2

Another way could be to use a for loop.

for(i in vector) df[[i]][df[[i]]==1] <- 0
#for(i in vector) df[,i][df[,i]==1] <- 0 #Variant

Benchmark

bench::mark(check=FALSE,
langtang = local({df <- dplyr::mutate(df,dplyr::across(all_of(vector),\(v) replace(v,v==1,0)))}),
"Maël" = local({df[, vector] <- replace(df[, vector], df[, vector] == 1, 0)}),
GKi = local({df[,vector][df[,vector]==1] <- 0}),
GKi2 = local(for(i in vector) df[,i][df[,i]==1] <- 0),
GKi3 = local(for(i in vector) df[[i]][df[[i]]==1] <- 0)
)
#  expression      min median itr/s…¹ mem_al…² gc/se…³ n_itr  n_gc total…⁴ result
#  <bch:expr> <bch:tm> <bch:>   <dbl> <bch:by>   <dbl> <int> <dbl> <bch:t> <list>
#1 langtang     2.66ms    3ms    299.   7.89KB    8.37   143     4   478ms <NULL>
#2 Maël       219.56µs  241µs   4017.     280B   12.3   1955     6   487ms <NULL>
#3 GKi        222.48µs  243µs   4013.     280B   12.3   1951     6   486ms <NULL>
#4 GKi2       106.96µs  116µs   8452.     280B   12.3   4119     6   487ms <NULL>
#5 GKi3        60.75µs   65µs  15217.     280B   14.4   7398     7   486ms <NULL>

The for loop is about 3 times faster than the other base variants and about 50 times faster than the dplyr variant. All base variants use less memory compared to the dplyr variant.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.