2

I have a 2-D list of sublists of different lengths, I need to covert the list to a numpy array such that all the remaining values of shorter sublists are filled with -1, and I am looking for an efficient way to do this.

For example I have 2-D list x:

x = [
    [0,2,3],
    [],
    [4],
    [5,6]]

I want to get a numpy array that look like this:

>>> array_x
array([[ 0,  2,  3],
       [-1, -1, -1],
       [ 4, -1, -1],
       [ 5,  6, -1]]) 

The basic way to do it is to create an array of -1s and then loop over the 2D list to fill in the remaining values, like this:

n_rows = len(x)
n_cols = max(len(ele) for ele in x)

new_array = np.ones((n_rows, n_cols)) * -1

for i, row in enumerate(x):
    for j, ele in enumerate(row):
        new_array[i, j] = ele

But is there a more efficient solution?

3
  • Using new_array = np.empty((n_rows, n_cols)) new_array.fill(-1) makes this 100% faster Commented May 17, 2013 at 2:41
  • @jamylak, I see. Thanks for the answer, so I guess there is no obvious way to get rid of the for loop. Commented May 17, 2013 at 12:22
  • I don't know any fast way, that's not to say it doesn't exist, I tried np.array(tuple(izip_longest(*x, fillvalue=-1)), dtype=np.int).T but that was slow. You could maybe put a bounty on this question to try and get more attention to find a faster way which removes the loop Commented May 17, 2013 at 12:23

1 Answer 1

3

Some speed improvements to your original solution:

n_rows = len(x)
n_cols = max(map(len, x))

new_array = np.empty((n_rows, n_cols))
new_array.fill(-1)
for i, row in enumerate(x):
    for j, ele in enumerate(row):
        new_array[i, j] = ele

Timings:

import numpy as np
from timeit import timeit
from itertools import izip_longest

def f1(x, enumerate=enumerate, max=max, len=len):
    n_rows = len(x)
    n_cols = max(len(ele) for ele in x)

    new_array = np.ones((n_rows, n_cols)) * -1
    for i, row in enumerate(x):
        for j, ele in enumerate(row):
            new_array[i, j] = ele
    return new_array

def f2(x, enumerate=enumerate, max=max, len=len, map=map):
    n_rows = len(x)
    n_cols = max(map(len, x))

    new_array = np.empty((n_rows, n_cols))
    new_array.fill(-1)
    for i, row in enumerate(x):
        for j, ele in enumerate(row):
            new_array[i, j] = ele

    return new_array

setup = '''x = [[0,2,3],
    [],
    [4],
    [5,6]]
from __main__ import f1, f2'''

print timeit(stmt='f1(x)', setup=setup, number=100000)
print timeit(stmt='f2(x)', setup=setup, number=100000)

>>> 
2.01299285889
0.966173887253
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.