1

I'm currently doing a project that uses R to process some large csv files that are saved in my local directory linked to my repo.

So far, I managed to create the R project and commit and push R scripts into the repo with no problem.

However, the scripts read in the data from the csv files saved in my local directory, so the code goes in a form

df <- read.csv("mylocaldirectorylink") 

However, this is not helpful if my partner and I working on the same project have to change that url to our own local directory every time we pull it off the repo. So I was thinking that maybe we can upload the csv files onto GitHub Repo and let the R script refer directly to the csv files online.

So my questions are:

  • Why can't I upload csv files onto GitHub? They keep saying that my file is too large.
  • If I can upload the csv files, how to I read the data from these csv files?
2
  • 2
    Github isn't a file sharing service. If you're looking to share the data for analysis why not Google Drive, which has a package facilitate access. Commented Oct 21, 2017 at 17:37
  • (a) df is a bad variable name (b) if you're getting that error then your CSV is YUGE and you shld consider migrating to RDS files with xz encryption. That will get you around the limits. It's a bad idea to refer to GH URLs for data but if it's cloned you can use the rprojroot pkg to ensure you are both using the local copies. If you're stuck w/CSV (ugh) use Amazon S3, Google Drive, Dropbox or some other, similar service (as Jake suggested). Commented Oct 22, 2017 at 1:55

2 Answers 2

2

Firstly, it's generally a bad idea to store data on Github, especially if it's large. If you want to save it somewhere on the Internet, you can use, say, Dataverse, and then can access your data with URL (through the API), or Google Drive, as Jake Kaupp suggested.

Now back to your question. If your data doesn't change, I would just use not the absolute paths to CSV but relative ones. In other words, instead of

df<-read.csv("C:/folder/subfolder/data.csv")

I would use

df <- read.csv("../data.csv")

If you are working with R project, then the initial working directory is inside the folder of the project. You can check it with getwd(). This working directory changes as you move the R project. Just agree with your colleague that your data file should be in the same folder where the folder with R project is situated.

Sign up to request clarification or add additional context in comments.

1 Comment

(FYI GitHub is on the internet and is used for data storage/sharing all. the. time. + rprojroot is a more generalizable recommendation)
0

This is for a Python script.

You can track csv files by editing your .gitignore file.

     **OR**

You can add csv files in your github repo, which can be used by others.

I did so by following steps:

  1. Checkout the branch on github.com
  2. Go to the folder where you want to keep csv files.
  3. Here, you will see an option "Add file" in top right area as shown below: Add_File_github
  1. Here you can upload csv files and commit the changes in same branch or by creating a new branch.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.