How to upload CSV files to GitHub repo and use them as data for my R scripts

Question

I'm currently doing a project that uses R to process some large csv files that are saved in my local directory linked to my repo.

So far, I managed to create the R project and commit and push R scripts into the repo with no problem.

However, the scripts read in the data from the csv files saved in my local directory, so the code goes in a form

df <- read.csv("mylocaldirectorylink")

However, this is not helpful if my partner and I working on the same project have to change that url to our own local directory every time we pull it off the repo. So I was thinking that maybe we can upload the csv files onto GitHub Repo and let the R script refer directly to the csv files online.

So my questions are:

Why can't I upload csv files onto GitHub? They keep saying that my file is too large.
If I can upload the csv files, how to I read the data from these csv files?

Github isn't a file sharing service. If you're looking to share the data for analysis why not Google Drive, which has a package facilitate access. — Jake Kaupp
– Jake Kaupp, Commented Oct 21, 2017 at 17:37
(a) df is a bad variable name (b) if you're getting that error then your CSV is YUGE and you shld consider migrating to RDS files with xz encryption. That will get you around the limits. It's a bad idea to refer to GH URLs for data but if it's cloned you can use the rprojroot pkg to ensure you are both using the local copies. If you're stuck w/CSV (ugh) use Amazon S3, Google Drive, Dropbox or some other, similar service (as Jake suggested). — hrbrmstr
– hrbrmstr, Commented Oct 22, 2017 at 1:55

Alex Knorre · Accepted Answer · 2017-10-21 17:54:09Z

2

Firstly, it's generally a bad idea to store data on Github, especially if it's large. If you want to save it somewhere on the Internet, you can use, say, Dataverse, and then can access your data with URL (through the API), or Google Drive, as Jake Kaupp suggested.

Now back to your question. If your data doesn't change, I would just use not the absolute paths to CSV but relative ones. In other words, instead of

df<-read.csv("C:/folder/subfolder/data.csv")

I would use

df <- read.csv("../data.csv")

If you are working with R project, then the initial working directory is inside the folder of the project. You can check it with getwd(). This working directory changes as you move the R project. Just agree with your colleague that your data file should be in the same folder where the folder with R project is situated.

edited Oct 21, 2017 at 17:54

answered Oct 21, 2017 at 17:48

Alex Knorre

6294 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hrbrmstr Over a year ago

(FYI GitHub is on the internet and is used for data storage/sharing all. the. time. + rprojroot is a more generalizable recommendation)

mufassir · Accepted Answer · 2021-08-23 10:31:04Z

0

This is for a Python script.

You can track csv files by editing your .gitignore file.

     **OR**

You can add csv files in your github repo, which can be used by others.

I did so by following steps:

Checkout the branch on github.com
Go to the folder where you want to keep csv files.
Here, you will see an option "Add file" in top right area as shown below:

Here you can upload csv files and commit the changes in same branch or by creating a new branch.

edited Aug 23, 2021 at 10:31

answered Aug 18, 2021 at 14:43

mufassir

5722 gold badges6 silver badges19 bronze badges

Collectives™ on Stack Overflow

How to upload CSV files to GitHub repo and use them as data for my R scripts

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related