0

I am trying to call a user defined function in order to create a new column that depends on the value of the other columns of my data.table. In simple cases, I do not encounter any error, but when I am either using conditional statement or loops, it looks as if the user defined function receives the entire column as a parameter.

Learning from other cases reported on stack overflow (eg : R data.table user defined function), I understand that this problem can be overcome for if statements using the ifelse function. However, I can't find a solution for the loop statement.

Please, see below the code I want to run that returns the following error message : Error in seq.default(1, a, 1) : 'to' must be of length 1

test <-data.table(a=c(1,2))

f <- function(a) {
  out <- 0
  for (i in seq(1,a,1)){
    out <- out +1
  }
  return(out)
}

test[,b:=f(a)]

Obviously, f(x)=x but I chose this function for the sake of simplicity. Also, note that replacing seq(1,a,1) by 1:a throws the following warning message : In 1:a : numerical expression has 2 elements: only the first used.


Below is more detailed explanation of the desired behavior.

test <-data.table(a=c(1,2,3),b=c(4,5,6))
f <- function(a,b){
  out <-0
  for (i in seq(1,a,1)){
    out <- out + b^(i) 
  }
  return(out)
}

I would like to have test[,c=f(a,b)] gives :

test
# a b c
# 1 4 4
# 2 5 30    # 5 + 5^2
# 3 6 258   # 6 + 6^2 + 6^3

Is there a way to get the desired behaviour ?

3
  • Yes, this is where the problem stems from. However, for dt<-data.table(a=c(1,2)), calling the function g<-function(a){return(a)}this way dt[,b:=g(a)] leads to the desire result. The function g only take one parameter (the one of the row), and not the entire row. Commented Apr 21, 2020 at 13:36
  • Thanks again for your answer. The function I want to use is more complicated that this one, and it needs to use the elements of one column as number of loops to be done. I can't think of any workaround. For instance, with the input column 1,2,3 I would like the output column 1,2+2^2,3+3^2+3^3 calling the function f <- function(a){out<-0/n for (i in 1:a){out<- out + a^a}/n return(a)} Commented Apr 21, 2020 at 13:44
  • No, because this way, every row will get the same result. I will update the question with a more detailed explanation of the desired result. Thanks ! Commented Apr 21, 2020 at 13:57

1 Answer 1

0

Two solutions to solving the problem (thanks @chinsoon12) :

test[,c:=mapply(f, test[,a],test[,b])]

test[,c:=f(a,b),1L:nrow(test)]

Speed-wise, these two solutions are equivalent :

a<-1:500
b<-500:1

test_1 <- data.table(a,b)
test_2 <- data.table(a,b)

bench <- microbenchmark(v_1 = test_1[,c:=mapply(f,test_1[,a],test_1[,b])],v_2 = test_2[,c:=f(a,b),1L:nrow(test_2)],times=100L)

summary(bench)
#  expr      min       lq     mean   median       uq      max neval cld
#1  v_1 91.83598 95.63639 97.82780 96.94672 98.51073 113.2232   100   a
#2  v_2 91.72392 95.45878 98.92037 96.53573 98.71301 139.9906   100   a

autoplot(bench)

Benchmark plot

Sign up to request clarification or add additional context in comments.

1 Comment

another option is test[, v := f(a, b), 1L:nrow(test)] or in this case, test[, v := sum(cumprod(rep(b, a))), 1L:nrow(test)]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.