How Can I Make Sure All My .CSV Data Gets Imported as NA instead of Blank in R?

Question

In my dataset, I'm using have four assessments I'm trying to predict: 1 [Good] to 4 [Bad].

My model seems to be working using the polr function to predict values using ordered logistic regression -- though it's giving me the 'warning message': In cbind(race, partisanship, sex, age) : number of rows of result is not a multiple of vector length (arg 4), because there are some cells that I can see got imported as blanks instead of NAs.

Here's what the output looks like:

mydata <- read.csv("~/Desktop/R/mydata.csv")
attach(mydata)    
> y <- as.factor(assessment)
> x <- cbind(race, partisanship, sex, age)
Warning message:
In cbind(race, partisanship, sex, age) :
  number of rows of result is not a multiple of vector length (arg 4)
> 
> olr <- polr(y ~ x, mydata)
> summary(olr)

Re-fitting to get Hessian

Call:
polr(formula = y ~ x, data = mydata)

Coefficients:
                 Value Std. Error t value
xrace          0.49485   0.214426  2.3078
xpartisanship -0.00990   0.002942 -3.3654
xsex          -0.21304   0.299763 -0.7107
xage           0.01486   0.006812  2.1819

Intercepts:
    Value   Std. Error t value
1|2 -1.4763  0.8253    -1.7887
2|3  1.8049  0.8237     2.1913
3|4  2.4739  0.8290     2.9842

Residual Deviance: 667.1306 
AIC: 681.1306 
(1401 observations deleted due to missingness)

I tried to combat the problem adding na.strings = "" and x[x==""] <- NA before I define x-- it looks better in the summary output -- but I still get the error.

It's the race column that for some reason imports missing cells as blanks instead of NAs, because when I look at the .csv file using view(mydata) in R-Studio, I see blanks instead of NAs in the race column, while all the other columns have NAs where I'm missing data. Though when I look at the output, it shows NAs.

For example, in R-Studio, row 7 shows a NA for partisanship already, but row 10 shows a blank for race:

> head(x, 10)
      race partisanship age
 [1,]    2         97.4  80
 [2,]    2         96.7  75
 [3,]    3         95.0  70
 [4,]    3         87.7  65
 [5,]    3         85.2  60
 [6,]    3          4.7  50
 [7,]    3           NA  40
 [8,]    3          9.1  30
 [9,]    3          1.1  80
[10,]   NA         10.2  75

Does anybody have any ideas on how I can removing this error? And a way to import all .csv files with NAs so I know everything's lining up properly?

EDIT: If it helps, after doing a bit more research, it looks like the columns with missing values showing up as blanks instead of NAs stems from manual editing of the data to clean it up before loading it into R. Most of the data I have to import requires a bit of clean-up first, so I don't know how to get around doing this.

Thanks!

The error you got in your cbind needs to be addressed. Evidently your four vectors are not the same length, so you're probably ending up with an $x$ that's not what you think it is. — Wayne
– Wayne, Commented Apr 17, 2014 at 20:46
@Wayne Thanks. Is this a result of not having the race or age, etc. of certain people -- making them different lengths? If so, what's the best way to fix that? Maybe by wrapping it with na.omit like this? x <- cbind(na.omit(race, partisanship, sex, age)) — Ryan
– Ryan, Commented Apr 17, 2014 at 20:56
@Wayne But after trying it, using na.omit took away all of my covariates in the output and left only the intercepts. — Ryan
– Ryan, Commented Apr 17, 2014 at 21:03
No, your vectors appear to be of a different length. It's impossible for me to tell how they got this way. Your suggestion is the opposite of the correct answer and may be how you got in trouble. If you don't know the sixth person's age, you need an NA in age[6], so that age[7] will still correspond to sex[7]. If you remove NA's, age[7] will correspond to sex[6]. — Wayne
– Wayne, Commented Apr 17, 2014 at 21:05
@Wayne Thanks again! I imported a .csv and see that most columns have 'NA' where there were blanks, but there are two that are blank -- which means they're doing what you described. Do you know what makes them blank instead of 'NA'? I tried re-opening the .csv, removing formatting, re-copying and pasting, but when I import them back in, they're still blank. — Ryan
– Ryan, Commented Apr 17, 2014 at 21:14

Wayne · Accepted Answer · 2014-04-17 21:10:30Z

1

It's getting to be a long string of comments, so let me put it into an answer.

It appears, from the cbind error, that age, sex, partisanship, and race are not the same length. This is a serious error. It means that somewhere in your data, the link between age[n], sex[n], partisanship[n], and race[n] has been broken.

This might be the result of doing an na.omit on one or more of the vectors. NA's should be there when you don't know an answer. If you know all the ages, sex's, partisanship, and race of all participants except for the age of participant 12, you need an NA in age[12] so that everything lines up. If you remove the NA, what's in age[13] ends up in age[12] and so matches up with sex[12], partisanship[12], and race[12] instead of with sex[13], partisanship[13], and race[13]. If age was originally, say, 42 long, age[42] will not have any value and R is warning you that it forced things to work by wrapping around and assigning age[42] = age[1].

Does that make sense?

So you need to figure out how the vectors became different lengths in the first place.

answered Apr 17, 2014 at 21:10

Wayne

1,0039 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ryan Over a year ago

Thanks again -- I added x[x==""] <- NA to my script, which seems to help (and did change the output -- which I edited above), though I still get the error.

Ryan Over a year ago

Though I noticed that by doing this, when I print the dataset, the data imported as null automatically is in as NA while the data imported as blanks that I attempted to force as "NA" is now in as <NA>.

Collectives™ on Stack Overflow

How Can I Make Sure All My .CSV Data Gets Imported as NA instead of Blank in R?

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related