I store my data in Postgresql server. I want to load a table which has 15mil rows to data.frame or data.table
I use RPostgreSQL to load data.
library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, ...)
# Select data from a table
system.time(
df <- dbGetQuery(con, "SELECT * FROM 15mil_rows_table")
)
It took 20 minutes to load data from DB to df. I use google cloud server which have 60GB ram and 16 Core CPU
What should I do to reduce load time?
src_postgresfromdplyryou can then usedplyrfunctions for the aggregation and it will push many if not all of those operations back onto the database itself and you won't need to read all the records into R. ref: cran.rstudio.com/web/packages/dplyr/vignettes/databases.htmldplyrlater. I am still want to find the answer for my question because I want to load all data to aggression and plot data.