I'm trying to use arguments to a data.table to subset (and apply a mean to that subset). Basically I'll pass to the function two keys and several elements of the third key; this seems to be confusing R, but the operation works exactly as expected when done outside of a function environment.
Here's an example that basically gets what I'm trying to do; it returns a solution that is incorrect, while my own code produces an error (text pasted below):
set.seed(12345)
dt<-data.table(yr=rep(2000:2005,each=20),
id=paste0(rep(rep(1:10,each=2),6)),
deg=paste0(rep(1:2,60)),
var=rnorm(120),
key=c("yr","id","deg"))
fcn <- function(yr,ids,deg){
dt[.(yr,ids,deg),mean(var)]
}
fcn(2004,paste0(1:3),"1")
This is giving an answer, but it's totally wrong (more on that in a second). If I do this by hand, there's no problem:
> fcn(2004,paste0(1:3),"1")
[1] 0.1262586
> dt[yr==2004&id %in% paste0(1:3)°=="1",mean(var)]
[1] 0.4374115
> dt[.(2004,paste0(1:3),"1"),mean(var)]
[1] 0.4374115
To crack what's going on, I changed the fcn code to:
fcn <- function(yr,ids,deg){
dt[.(yr,ids,deg),]
}
Which yields:
> fcn(2004,paste0(1:3),"1")
yr id deg var
1: 2000 1 1 0.5855288
2: 2000 2 2 -0.4534972
3: 2000 3 1 0.6058875
4: 2000 1 2 0.7094660
5: 2000 2 1 -0.1093033
---
116: 2005 2 2 -1.3247553
117: 2005 3 1 0.1410843
118: 2005 1 2 -1.1562233
119: 2005 2 1 0.4224185
120: 2005 3 2 -0.5360480
Basically, fcn has done no subsetting! Why is this happening? Really frustrated.
If I only pass one key instead of three, dt subsets on the middle key only. Weird:
> fcn(2004,"1","1")
yr id deg var
1: 2000 1 1 0.5855288
2: 2000 1 2 0.7094660
3: 2000 1 1 0.5855288
4: 2000 1 2 0.7094660
5: 2000 1 1 0.5855288
---
116: 2005 1 2 -1.1562233
117: 2005 1 1 0.2239254
118: 2005 1 2 -1.1562233
119: 2005 1 1 0.2239254
120: 2005 1 2 -1.1562233
But if I pass only the middle keys to the function, it works fine:
fcn <- function(ids){
dt[.(2004,ids,"1")]
}
> fcn(paste0(1:3))
yr id deg var
1: 2004 1 1 0.6453831
2: 2004 2 1 -0.3043691
3: 2004 3 1 0.9712207
Final edit: problem solved, but it would still be nice to know what exactly was going wrong:
Rename the arguments:
fcn <- function(yyr,ids,ddeg){
dt[.(yyr,ids,ddeg),mean(var)]
}
Something about re-using the column names as variable names caused an issue, it seems--but I'm still not fully understanding what went wrong.