4

This is a beginner's question but how do you save a 2d numpy array to a file in (compressed) R format using rpy2? To be clear, I want to save it in rpy2 and then later read it in using R. I would like to avoid csv as the amount of data will be large.

4 Answers 4

7

Looks like you want the save command. I would use the pandas R interface and do something like the following.

import numpy as np
from rpy2.robjects import r
import pandas.rpy.common as com
from pandas import DataFrame
a = np.array([range(5), range(5)])
df = DataFrame(a)
df = com.convert_to_r_dataframe(df)
r.assign("foo", df)
r("save(foo, file='here.gzip', compress=TRUE)")

There may be a more elegant way, though. I'm open to better suggestions. The above, in R would be used:

> load("here.gzip")
> foo
  X0 X1 X2 X3 X4
0  0  1  2  3  4
1  0  1  2  3  4

You can bypass the use of pandas and use numpy2ri from rpy2. With something like:

from rpy2.robjects import r
from rpy2.robjects.numpy2ri import numpy2ri
a = np.array([[i*2147483647**2 for i in range(5)], range(5)], dtype="uint64")
a = np.array(a, dtype="float64") # <- convert to double precision numeric since R doesn't have unsigned ints
ro = numpy2ri(a)
r.assign("bar", ro)
r("save(bar, file='another.gzip', compress=TRUE)")

In R then:

> load("another.gzip")
> bar
     [,1]         [,2]         [,3]         [,4]         [,5]
[1,]    0 4.611686e+18 9.223372e+18 1.383506e+19 1.844674e+19
[2,]    0 1.000000e+00 2.000000e+00 3.000000e+00 4.000000e+00
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks but installing pandas under ubuntu 11.10 fails with error: Setup script exited with pandas requires NumPy >= 1.6 due to datetime64 dependency
I'm not sure how to do it without pandas. Can you upgrade your numpy? I usually use virtualenv and pip which will install the latest stable numpy and pandas for you.
Upgrading numpy will be a pain and also make the script less portable sadly. I feel rpy2 should be able to call save too if I can just get the right syntax for it.
added a pure rpy2 example; resulting R objects are a little different, this is probably what you want.
Thanks! I have upvoted. I now get the annoying ("Cannot convert numpy array of unsigned values -- R does not have unsigned integers.") which I suppose is the next thing to worry about :)
|
2

Here's an example without pandas that adds column and row names

import numpy as np
from rpy2.robjects import rinterface, r, IntVector, FloatVector, StrVector

# older (<2.1) versions of rpy2 have globenEvn vs globalenv
# let's fix it a little
if not hasattr(rinterface,'globalenv'):
        warnings.warn('Old version of rpy2 detected')
        rinterface.globalenv = rinterface.globalEnv

var_name = 'r_var'
vals = np.arange(20,dtype='float').reshape(4,5)

# transpose because R is column major vs python is row major 
r_vals = FloatVector(vals.T.ravel())
# make it  a matrix
rinterface.globalenv[var_name]=r['matrix'](r_vals,nrow=vals.shape[0])
# give it some row and column names
r("rownames(%s) <- c%s"%(var_name,tuple('ABCDEF'[i] for i in range(vals.shape[0]))))
r("colnames(%s) <- c%s"%(var_name,tuple(range(vals.shape[1]))))

#save it to file
r.save(var_name,file='r_from_py.rdata')

3 Comments

Thanks. Is FloatVector changing the type from unsigned int as well as transposing (see my comment to the first answer)?
@Raphael FloatVector creates a float but I also tested a version of the above with IntVector (with dtype='int') and had no errors.
In my case the data looks like [(5, 'text', 4) (3, 'more text', 2)...] so FloatVector gives me an error.
2

An alternative to rpy2 is to write a mat-file and load this mat-file from R.

in python:

os.chdir("/home/user/proj") #specify a path to save to
import numpy as np
import scipy.io
x = np.linspace(0, 2 * np.pi, 100)
y = np.cos(x)
scipy.io.savemat('test.mat', dict(x=x, y=y))

example copied from: "Converting" Numpy arrays to Matlab and vice versa

in R

library(R.matlab)
object_list = readMat("/home/user/proj/test.mat")

I'm a beginner in python.

Comments

2

Suppose that you have a dataframe called data then the following code help me to store this data as a matrix in R and then load it into R (R studio)

save data to R

# Take only the values of the dataframe
B=data.values

import rpy2.robjects as ro
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()

nr,nc = B.shape
Br = ro.r.matrix(B, nrow=nr, ncol=nc)

ro.r.assign("B", Br)
ro.r("save(B, file='here.Rdata')")

Then go to R and write this

load("D:/.../here.Rdata")

This has done the job for me!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.