13

I am currently working on a project where I need do some steps of processing with legacy Matlab code (using the Matlab engine) and the rest in Python (numpy).

I noticed that converting the results from Matlab's matlab.mlarray.double to numpy's numpy.ndarray seems horribly slow.

Here is some example code for creating an ndarray with 1000 elements from another ndarray, a list and an mlarray:

import timeit
setup_range = ("import numpy as np\n"
               "x = range(1000)")
setup_arange = ("import numpy as np\n"
                "x = np.arange(1000)")
setup_matlab = ("import numpy as np\n"
                "import matlab.engine\n"
                "eng = matlab.engine.start_matlab()\n"
                "x = eng.linspace(0., 1000.-1., 1000.)")
print 'From other array'
print timeit.timeit('np.array(x)', setup=setup_arange, number=1000)
print 'From list'
print timeit.timeit('np.array(x)', setup=setup_range, number=1000)
print 'From matlab'
print timeit.timeit('np.array(x)', setup=setup_matlab, number=1000)

Which takes the following times:

From other array
0.00150722111994
From list
0.0705359556928
From matlab
7.0873282467

The conversion takes about 100 times as long as a conversion from list.

Is there any way to speed up the conversion?

1
  • RobR's answer is more general, look at it for N(>2) dimensional arrays Commented Aug 23, 2018 at 12:39

2 Answers 2

16

Moments after posting the question I found the solution.

For one-dimensional arrays, access only the _data property of the Matlab array.

import timeit
print 'From list'
print timeit.timeit('np.array(x)', setup=setup_range, number=1000)
print 'From matlab'
print timeit.timeit('np.array(x)', setup=setup_matlab, number=1000)
print 'From matlab_data'
print timeit.timeit('np.array(x._data)', setup=setup_matlab, number=1000)

prints

From list
0.0719847538787
From matlab
7.12802865169
From matlab_data
0.118476275533

For multi-dimensional arrays you need to reshape the array afterwards. In the case of two-dimensional arrays this means calling

np.array(x._data).reshape(x.size[::-1]).T
Sign up to request clarification or add additional context in comments.

6 Comments

And if the data is complex, then use the _real and _imag property (instead of _data)
Or equivalently: np.array(x._data).reshape(x.size, order='F')
which is slightly faster
With MATLAB R2022a and later, you can and should pass the MATLAB object directly into the NumPy constructor, rather than using the undocumented _data attribute. Given the fact that the implementation of multidimensional arrays is now orders of magnitude faster (see the R2022a release notes), any workaround is unnecessary. Here's the output I get from the code in the main section of the post after replacing "x._data" by "x": From other array 0.0007055000000000256 From list 0.09001790000000004 From matlab 0.005489099999998359
Matlab support notes that in R2022a, _data changed from a Python array to a C++ Matlab Data Array object and is much faster. They provide the .noncomplex and .real and .imag calls on this object to retrieve the underlying data in a 1-D format.
|
16

Tim's answer is great for 2D arrays, but a way to adapt it to N dimensional arrays is to use the order parameter of np.reshape() :

np_x = np.array(x._data).reshape(x.size, order='F')

2 Comments

I think this should be np_x = np.array(x._data).reshape(x.size, order='F').T
@RuslanShaydulin no because order='F' has been explicitly defined

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.