3

I found this prior post and it gets me close. how-to-convert-a-pandas-dataframe-subset-of-columns-and-rows-into-a-numpy-array

But instead of making a single array (or matrix) of two columns based on the value in a third, I need to iterate through the data frame and create a 3x3 array (or matrix) from columns 'b' through 'j' for each correctly matching value in 'a'.

         dft = pd.DataFrame({'a' : ['NW'  ,'NW', 'SL', 'T'], 
'b' : [1,2,3,4], 
'c' : [5,6,7,8], 
'd' : [11,12,13,14], 
'e' : [9,10,11,12], 
'f' : [4,3,2,1], 
'g' : [15,14,13,12], 
'h' : [13,14,15,16], 
'i' : [5,4,3,2], 
'j' : [9,8,7,6]
})

    print(dft)
         a  b   c   d   e   f   g   h   i   j
     0  NW  1   5   11  9   4   15  13  5   9
     1  NW  2   6   12  10  3   14  14  4   8
     2  SL  3   7   13  11  2   13  15  3   7
     3  T   4   8   14  12  1   12  16  2   6

What I want is 2 separate arrays, 1 for each NW

     [[ 1  5 11]
      [ 9  4 15]
      [13  5  9]]

     [[ 2  6 12]
      [10  3 14]
      [14  4  8]]

I have tried the following and received a really ugly error. The code is an attempt based on the original post.

    dft.loc[dft['a'] == 'NW',['b', 'c', 'd'], ['e', 'f', 'g'], ['h', 'i', 'j']].values

Here is the error -

IndexingError Traceback (most recent call last) in () ----> 1 dft.loc[dft['a'] == 'NW',['b', 'c', 'd'], ['e', 'f', 'g'], ['h', 'i', 'j']].values

D:\Applications\Anaconda\lib\site-packages\pandas\core\indexing.py in getitem(self, key) 1323 except (KeyError, IndexError): 1324 pass -> 1325 return self._getitem_tuple(key) 1326 else: 1327 key = com._apply_if_callable(key, self.obj)

D:\Applications\Anaconda\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup) 839 840 # no multi-index, so validate all of the indexers --> 841 self._has_valid_tuple(tup) 842 843 # ugly hack for GH #836

D:\Applications\Anaconda\lib\site-packages\pandas\core\indexing.py in _has_valid_tuple(self, key) 186 for i, k in enumerate(key): 187 if i >= self.obj.ndim: --> 188 raise IndexingError('Too many indexers') 189 if not self._has_valid_type(k, i): 190 raise ValueError("Location based indexing can only have [%s] "

IndexingError: Too many indexer

Thoughts? I am so close, yet tantalizing far.

  • And I have no clue how to format the error code- so any help on that to clear it up?

2 Answers 2

5

You can do this without loop

a = df.loc[df['a'] == 'NW', 'b':'j']
n = a.shape[0]
new_a = a.values.reshape(n,3,3)

You get

array([[[ 1,  5, 11],
        [ 9,  4, 15],
        [13,  5,  9]],

       [[ 2,  6, 12],
        [10,  3, 14],
        [14,  4,  8]]])
Sign up to request clarification or add additional context in comments.

10 Comments

I like this solution, but it assumes that there are only 2 rows with 'NW' in the 'a' column. Do you know if there's a way to do this in one line without having to hard-code the first value (2) in reshape()?
@OriolMirosa, you are right. See the edit to my solution
Nice! I guess you could even delete the second line and have the third be simply new_a = a.values.reshape(a.shape[0], 3, 3), right?
@OriolMirosa, yes. I just created a variable n for clarity
And here I was thinking it would be hard ;) Thanks to this board!
|
0

I'm not 100% sure what you're after, but maybe this will help:

new_arrays = []

for index, row in dft.iterrows():
    if row['a'] == 'NW':
        new_arrays.append(row[1:].values.reshape(3, 3))

With itertuples(), as requested in the comments:

for index, row in enumerate(dft.itertuples(), 1):
    if row[1] == 'NW':
        new_arrays.append(np.array(row[2:]).reshape(3, 3))

Now you have each of the two arrays in new_arrays, and you can print them together or access individually:

new_arrays[0]

array([[1, 5, 11],
       [9, 4, 15],
       [13, 5, 9]], dtype=object)

new_arrays[1]

array([[2, 6, 12],
       [10, 3, 14],
       [14, 4, 8]], dtype=object)

4 Comments

That worked in my sample data perfectly. but in trying it with my real data, where I simply did if row ['Code'] == 'Network' it tells me Code' is not defined. What am I missing? The rest of the data is exactly the same, just more columns in the raw data. Also, I read in the Pandas documentation that iterrows is bad and to use itertuples. I tried that and it said ValueError: too many values to unpack (expected 2).
It looks like your data frame doesn't have a column called Code. Is that the case? As for itertuples(), it makes things a little more awkward, but I added the solution to the answer.
Doing dft.columns returns Code\n. Added the \n and it worked. Gues because the column is text?????
The \n is the newline character, you might want to clean it in dft.colums in order to avoid problems in the future. I’m glad it works now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.