This is a beginner's question but how do you save a 2d numpy array to a file in (compressed) R format using rpy2? To be clear, I want to save it in rpy2 and then later read it in using R. I would like to avoid csv as the amount of data will be large.
4 Answers
Looks like you want the save command. I would use the pandas R interface and do something like the following.
import numpy as np
from rpy2.robjects import r
import pandas.rpy.common as com
from pandas import DataFrame
a = np.array([range(5), range(5)])
df = DataFrame(a)
df = com.convert_to_r_dataframe(df)
r.assign("foo", df)
r("save(foo, file='here.gzip', compress=TRUE)")
There may be a more elegant way, though. I'm open to better suggestions. The above, in R would be used:
> load("here.gzip")
> foo
X0 X1 X2 X3 X4
0 0 1 2 3 4
1 0 1 2 3 4
You can bypass the use of pandas and use numpy2ri from rpy2. With something like:
from rpy2.robjects import r
from rpy2.robjects.numpy2ri import numpy2ri
a = np.array([[i*2147483647**2 for i in range(5)], range(5)], dtype="uint64")
a = np.array(a, dtype="float64") # <- convert to double precision numeric since R doesn't have unsigned ints
ro = numpy2ri(a)
r.assign("bar", ro)
r("save(bar, file='another.gzip', compress=TRUE)")
In R then:
> load("another.gzip")
> bar
[,1] [,2] [,3] [,4] [,5]
[1,] 0 4.611686e+18 9.223372e+18 1.383506e+19 1.844674e+19
[2,] 0 1.000000e+00 2.000000e+00 3.000000e+00 4.000000e+00
9 Comments
virtualenv and pip which will install the latest stable numpy and pandas for you.Here's an example without pandas that adds column and row names
import numpy as np
from rpy2.robjects import rinterface, r, IntVector, FloatVector, StrVector
# older (<2.1) versions of rpy2 have globenEvn vs globalenv
# let's fix it a little
if not hasattr(rinterface,'globalenv'):
warnings.warn('Old version of rpy2 detected')
rinterface.globalenv = rinterface.globalEnv
var_name = 'r_var'
vals = np.arange(20,dtype='float').reshape(4,5)
# transpose because R is column major vs python is row major
r_vals = FloatVector(vals.T.ravel())
# make it a matrix
rinterface.globalenv[var_name]=r['matrix'](r_vals,nrow=vals.shape[0])
# give it some row and column names
r("rownames(%s) <- c%s"%(var_name,tuple('ABCDEF'[i] for i in range(vals.shape[0]))))
r("colnames(%s) <- c%s"%(var_name,tuple(range(vals.shape[1]))))
#save it to file
r.save(var_name,file='r_from_py.rdata')
3 Comments
An alternative to rpy2 is to write a mat-file and load this mat-file from R.
in python:
os.chdir("/home/user/proj") #specify a path to save to
import numpy as np
import scipy.io
x = np.linspace(0, 2 * np.pi, 100)
y = np.cos(x)
scipy.io.savemat('test.mat', dict(x=x, y=y))
example copied from: "Converting" Numpy arrays to Matlab and vice versa
in R
library(R.matlab)
object_list = readMat("/home/user/proj/test.mat")
I'm a beginner in python.
Comments
Suppose that you have a dataframe called data then the following code help me to store this data as a matrix in R and then load it into R (R studio)
save data to R
# Take only the values of the dataframe
B=data.values
import rpy2.robjects as ro
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
nr,nc = B.shape
Br = ro.r.matrix(B, nrow=nr, ncol=nc)
ro.r.assign("B", Br)
ro.r("save(B, file='here.Rdata')")
Then go to R and write this
load("D:/.../here.Rdata")
This has done the job for me!