3

My code is shown below:

from sklearn.datasets import load_svmlight_files
import numpy as np

perm1 =np.random.permutation(25000)
perm2 = np.random.permutation(25000)

X_tr, y_tr, X_te, y_te = load_svmlight_files(("dir/file.feat", "dir/file.feat"))

#randomly shuffle data
X_train = X_tr[perm1,:].toarray()[:,0:2000]
y_train = y_tr[perm1]>5 #turn into binary problem

The code works fine until here, but when I try to convert one more object to an array, my program returns a memory error.

Code:

X_test = X_te[perm2,:].toarray()[:,0:2000]

Error:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-7-31f5e4f6b00c> in <module>()
----> 1 X_test = X_test.toarray()

C:\Users\Asq\AppData\Local\Enthought\Canopy\User\lib\site-packages\scipy\sparse\compressed.pyc in toarray(self, order, out)
    788     def toarray(self, order=None, out=None):
    789         """See the docstring for `spmatrix.toarray`."""
--> 790         return self.tocoo(copy=False).toarray(order=order, out=out)
    791 
    792     ##############################################################

C:\Users\Asq\AppData\Local\Enthought\Canopy\User\lib\site-packages\scipy\sparse\coo.pyc in toarray(self, order, out)
    237     def toarray(self, order=None, out=None):
    238         """See the docstring for `spmatrix.toarray`."""
--> 239         B = self._process_toarray_args(order, out)
    240         fortran = int(B.flags.f_contiguous)
    241         if not fortran and not B.flags.c_contiguous:

C:\Users\Asq\AppData\Local\Enthought\Canopy\User\lib\site-packages\scipy\sparse\base.pyc in _process_toarray_args(self, order, out)
    697             return out
    698         else:
--> 699             return np.zeros(self.shape, dtype=self.dtype, order=order)
    700 
    701 

MemoryError: 

I'm new in python, and I dont know whether one needs to manually fix the memory error.

Other parts of my code return the same errors (like training with knn or ann).

How can I fix this?

3
  • You probably exhausted your system's available memory. Buy more or allocate more (swap/paging). Commented May 26, 2014 at 23:55
  • i use windows and swap memory is now extended to 4gb. My ram is 8gb. And python use now 2.5 gb of my memory (just code until here is ran). Commented May 26, 2014 at 23:58
  • It would be helpful if you could replace the line in your code that loads the svm data by setting these variables to something random with the same shape and matrix type so that one can try to reproduce the problem by copying and pasting. If you are unable to do this, at least provide the shapes of the arrays. Commented May 27, 2014 at 6:20

2 Answers 2

7

In cases like these, it's often possible to avoid converting your sparse matrices to dense format.

For example, you can do the permutation and slice easily with CSR or CSC sparse formats.

You haven't posted the code that follows, but I suspect that can be made to handle sparse inputs as well. If that's true, your memory issues will no longer be a problem.

Sign up to request clarification or add additional context in comments.

3 Comments

Your suggestion is true. as long as i dont really need the dense format. But i need to scale data and for some machine learning algorithms i'll need dense format. But i'm afraid that your suggestion is the only answer in order to solve the memory error. But then, i'll not be able to algorithms i wanted to use.
@Asqan Whether you need to scale depends on the nature of the data. Sparse data are often histograms, and those should be L2-normalized rather than scaled. L2 normalization preserves sparsity.
I know this is an old question but my first reaction was looking for a way to assign a dtype other than float64. toarray() fills with 0.0 in float64 format. Is there a way to do that?
3

Use numpy.asarray() in-place conversion instead of toarray() which requires new memory.

1 Comment

Can you explain a bit more?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.