6

I'm trying to use arguments to a data.table to subset (and apply a mean to that subset). Basically I'll pass to the function two keys and several elements of the third key; this seems to be confusing R, but the operation works exactly as expected when done outside of a function environment.

Here's an example that basically gets what I'm trying to do; it returns a solution that is incorrect, while my own code produces an error (text pasted below):

set.seed(12345)
dt<-data.table(yr=rep(2000:2005,each=20),
               id=paste0(rep(rep(1:10,each=2),6)),
               deg=paste0(rep(1:2,60)),
               var=rnorm(120),
               key=c("yr","id","deg"))

fcn <- function(yr,ids,deg){
  dt[.(yr,ids,deg),mean(var)]
}

fcn(2004,paste0(1:3),"1")

This is giving an answer, but it's totally wrong (more on that in a second). If I do this by hand, there's no problem:

> fcn(2004,paste0(1:3),"1")
[1] 0.1262586
> dt[yr==2004&id %in% paste0(1:3)&deg=="1",mean(var)]
[1] 0.4374115
> dt[.(2004,paste0(1:3),"1"),mean(var)]
[1] 0.4374115

To crack what's going on, I changed the fcn code to:

fcn <- function(yr,ids,deg){
  dt[.(yr,ids,deg),]
}

Which yields:

> fcn(2004,paste0(1:3),"1")
       yr id deg        var
  1: 2000  1   1  0.5855288
  2: 2000  2   2 -0.4534972
  3: 2000  3   1  0.6058875
  4: 2000  1   2  0.7094660
  5: 2000  2   1 -0.1093033
 ---                       
116: 2005  2   2 -1.3247553
117: 2005  3   1  0.1410843
118: 2005  1   2 -1.1562233
119: 2005  2   1  0.4224185
120: 2005  3   2 -0.5360480

Basically, fcn has done no subsetting! Why is this happening? Really frustrated.

If I only pass one key instead of three, dt subsets on the middle key only. Weird:

> fcn(2004,"1","1")
       yr id deg        var
  1: 2000  1   1  0.5855288
  2: 2000  1   2  0.7094660
  3: 2000  1   1  0.5855288
  4: 2000  1   2  0.7094660
  5: 2000  1   1  0.5855288
 ---                       
116: 2005  1   2 -1.1562233
117: 2005  1   1  0.2239254
118: 2005  1   2 -1.1562233
119: 2005  1   1  0.2239254
120: 2005  1   2 -1.1562233

But if I pass only the middle keys to the function, it works fine:

fcn <- function(ids){
  dt[.(2004,ids,"1")]
}
> fcn(paste0(1:3))
     yr id deg        var
1: 2004  1   1  0.6453831
2: 2004  2   1 -0.3043691
3: 2004  3   1  0.9712207

Final edit: problem solved, but it would still be nice to know what exactly was going wrong:

Rename the arguments:

fcn <- function(yyr,ids,ddeg){
  dt[.(yyr,ids,ddeg),mean(var)]
}

Something about re-using the column names as variable names caused an issue, it seems--but I'm still not fully understanding what went wrong.

1
  • Chalk this up as case of needing to write it down to troubleshoot properly. Commented Apr 28, 2015 at 21:42

1 Answer 1

7

The problem is you're using names of columns inside your i-expression, but expecting them to be names outside of the data.table. You can either rename the variable names in your function, or construct the join data.table outside and then use the fact that for single names data.table will always use the outside environment:

fcn <- function(yr,ids,deg){
  tmp = data.table(yr, ids, deg)
  dt[tmp, mean(var)]
}

fcn(2004, paste0(1:3), "1")
#[1] 0.4374115

See FAQ 2.12-2.13.

Sign up to request clarification or add additional context in comments.

5 Comments

So this is basically an environment problem--I need to access the variables yr and deg, but [.data.table first looks within the scope of dt and stops there when it finds those columns (never moving to the function environment where the values I pass to yr and deg are stored. Missing anything?
Ahhh, so why fcn(2004,"1","1") was working is because only the middle argument was uniquely named (ids vs. id). Sneaky.
What's still curious to me is that I can apparently use the key name as a wildcard for that key--I thought I remembered reading in the vignettes (here) that I should use unique(key) as that?
@MichaelChirico try dt[.(yr), allow = T] vs dt[.(unique(yr))] to see the difference
Neat. That's where the warnings were coming from--a much uglier version of dt[.(c(2004,2004))].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.