In my dataset, I'm using have four assessments I'm trying to predict: 1 [Good] to 4 [Bad].
My model seems to be working using the polr function to predict values using ordered logistic regression -- though it's giving me the 'warning message': In cbind(race, partisanship, sex, age) : number of rows of result is not a multiple of vector length (arg 4), because there are some cells that I can see got imported as blanks instead of NAs.
Here's what the output looks like:
mydata <- read.csv("~/Desktop/R/mydata.csv")
attach(mydata)
> y <- as.factor(assessment)
> x <- cbind(race, partisanship, sex, age)
Warning message:
In cbind(race, partisanship, sex, age) :
number of rows of result is not a multiple of vector length (arg 4)
>
> olr <- polr(y ~ x, mydata)
> summary(olr)
Re-fitting to get Hessian
Call:
polr(formula = y ~ x, data = mydata)
Coefficients:
Value Std. Error t value
xrace 0.49485 0.214426 2.3078
xpartisanship -0.00990 0.002942 -3.3654
xsex -0.21304 0.299763 -0.7107
xage 0.01486 0.006812 2.1819
Intercepts:
Value Std. Error t value
1|2 -1.4763 0.8253 -1.7887
2|3 1.8049 0.8237 2.1913
3|4 2.4739 0.8290 2.9842
Residual Deviance: 667.1306
AIC: 681.1306
(1401 observations deleted due to missingness)
I tried to combat the problem adding na.strings = "" and x[x==""] <- NA before I define x-- it looks better in the summary output -- but I still get the error.
It's the race column that for some reason imports missing cells as blanks instead of NAs, because when I look at the .csv file using view(mydata) in R-Studio, I see blanks instead of NAs in the race column, while all the other columns have NAs where I'm missing data. Though when I look at the output, it shows NAs.
For example, in R-Studio, row 7 shows a NA for partisanship already, but row 10 shows a blank for race:
> head(x, 10)
race partisanship age
[1,] 2 97.4 80
[2,] 2 96.7 75
[3,] 3 95.0 70
[4,] 3 87.7 65
[5,] 3 85.2 60
[6,] 3 4.7 50
[7,] 3 NA 40
[8,] 3 9.1 30
[9,] 3 1.1 80
[10,] NA 10.2 75
Does anybody have any ideas on how I can removing this error? And a way to import all .csv files with NAs so I know everything's lining up properly?
EDIT: If it helps, after doing a bit more research, it looks like the columns with missing values showing up as blanks instead of NAs stems from manual editing of the data to clean it up before loading it into R. Most of the data I have to import requires a bit of clean-up first, so I don't know how to get around doing this.
Thanks!
na.omitlike this?x <- cbind(na.omit(race, partisanship, sex, age))na.omittook away all of my covariates in the output and left only the intercepts.