2

I'm trying to read form an CSV where the first four columns are the indexes for a multi-dimensional array. I get the error:

KeyError: 0

from:

sp = []
csvFile = open("sp.csv", "rb")
csvReader = csv.reader(csvFile)
for row in csvReader:
    print row
    sp[int(row[0])][int(row[1])][int(row[2])][int(row[3])] = float(row[4])
0

3 Answers 3

2

You need to initialize a dictionary at every dimension eg sp[int(row[0])] needs to be assigned to first before you can access it with [int(row[1])]

Edit. Depending on your use case, you may get away with

sp = {}
sp[(int(row[0]), int(row[1]), ..] = float(row[4])

Yet another edit. I was thinking you might use numpy and ended up at this question: Python multi-dimensional array initialization without a loop which actually reflects your problem. It contains a non-numpy solution as the accepted answer. You'd need to know the dimensions for this, though.

Sign up to request clarification or add additional context in comments.

5 Comments

sigh, so n^4? thought python was better than that :(
collections.defaultdict helps but only one level deep
@Tjorriemorrie: Although it would require a lot more memory, you could make the multi-dimensional array a dictionary of dictionaries and thereby avoid having to preallocate every entry in it.
@martineau my bad was thinking of dictionaries already in my comment. However, even then sp[int(row[0])] default to {} and you need to init it at position[int(row[1])] to a new {} before you can assign to it
@Nicolas78: It's possible to use defaultdict and avoid having to do all that initialization -- it's called autovivification. See my answer.
2

Instead of an array, you could use a dictionary of dictionaries like this to avoid having to preallocate the entire structure beforehand:

from collections import defaultdict
tree = lambda: defaultdict(tree)

sp = tree()

print 3 in sp[1][2]  # -> False
sp[1][2][3] = 4.1
print 3 in sp[1][2]  # -> True
print sp[1][2][3]  # -> 4.1

sp[9][7][9] = 5.62
sp[4][2][0] = 6.29

3 Comments

This is a thing of beauty.
I can't seem to get this to work, can you perhaps please elaborate a bit more? my sp[1][2][3] returns defaultdict(<function <lambda> at 0x10cf05230>, {})
That's because you didn't assign a terminal (aka "leaf") value to sp[1][2][3] before referencing its contents, so an empty defaultdict (aka a "branch" node) got created automatically by default. This is instead of a KeyError: 3 being raised because the defaultdict in sp[1][2] -- also automatically created -- doesn't have a value for that key.
1

How about using Numpy? sp.csv might look like this:

0,0,0,4.1
1,1,2,5.2
0,1,1,3.2

Then, using Numpy, reading from file become a one-liner:

import numpy as np
sp = np.loadtxt('sp.csv', delimiter=',')

This yields a 2D record array:

array([[ 0. ,  0. ,  0. ,  4.1],
       [ 1. ,  1. ,  2. ,  5.2],
       [ 0. ,  1. ,  1. ,  3.2]])

Converting this sparse matrix to a full ndarray works like this, assuming 0-based indexing. I'm not happy with the idx= line (there must be a more direct way), but it works:

max_indices = sp.max(0)[:-1]
fl = np.zeros(max_indices + 1)
for row in sp:
    idx = tuple(row[:-1].astype(int))
    fl[idx] = row[-1]

Resulting in the following ndarray fl:

array([[[ 4.1,  0. ,  0. ],
        [ 0. ,  3.2,  0. ]],

       [[ 0. ,  0. ,  0. ],
        [ 0. ,  0. ,  5.2]]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.