2

I am transitioning from using data.frame in R to data.table for better performance. One of the main segments in converting code was applying custom functions from apply on data.frame to using it in data.table.

Say I have a simple data table, dt1.

x y z---header

1 9 j

4 1 n

7 1 n

Am trying to calculate another new column in dt1, based on values of x,y,z I tried 2 ways, both of them give the correct result, but the faster one spits out a warning. So want to make sure the warning is nothing serious before I use the faster version in converting my existing code.

(1) dt1[,a:={if((x<1) & (y>3) & (j == "n")){6} else {7}}]

(2) dt1[,a:={if((x<1) & (y>3) & (j == "n")){6} else {7}}, by = 1:nrow(x)]

Version 1 runs faster than version 2, but spits out a warning" the condition has length > 1 and only the first element will be used" But the result is good. The second version is slightly slower but doesn't give that warning. I wanted to make sure version one doesn't give erratic results once I start writing complicated functions.

Please treat the question as a generic one with the view to run a user defined function which wants to access different column values in a given row and calculate the new column value for that row.

Thanks for your help.

0

1 Answer 1

3

If 'x', 'y', and 'z' are the columns of 'dt1', try either the vectorized ifelse

dt1[, a:=ifelse(x<1 & y >3 & z=='n', 6, 7)] 

Or create 'a' with 7, then assign 6 to 'a' based on the logical index.

dt1[, a := 7][x<1 & y >3 & z=='n', a:=6][]

Using a function

getnewvariable <- function(v1, v2, v3){
   ifelse(v1 <1 & v2 >3 & v3=='n', 6, 7)
}

 dt1[, a:=getnewvariable(x,y,z)][]

data

df1 <- structure(list(x = c(0L, 1L, 4L, 7L, -2L), y = c(4L, 9L, 1L, 
1L, 5L), z = c("n", "j", "n", "n", "n")), .Names = c("x", "y", 
"z"), class = "data.frame", row.names = c(NA, -5L))

dt1 <- as.data.table(df1) 
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks akrun. One more addendum to my question. If I want to put all of my if/then logic in a function, so I can reuse it at different places in the code, how would the syntax look like? I tried something which works, but the performance I was guessing should have been better with may be better usage of data.table.
@user2956863 Could you update your post with the function you tried
@user2956863 Updated with a function
Works like a gem and you introduced me to vectorized statements.Thanks a lot akrun.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.