0

I read in the following object with numpy.genfromtxt:

A = [(4, 'A', 3750.5), 
     (4, 'B', 3252.6),
     (8, 'A', 3350.5), 
     (8, 'B', 3152.6)]

I would like to do numpy fancy indexing on it, but I can't because this is not an numpy array. It's an array of a list.

What would be the best way to get the 3rd column of all rows that have '4' in the first column?

I tried A[A[:,0]==4] but the interpreter complained with "IndexError: invalid index".

Edit:

This is the python program I am using:

import numpy as np

A = np.genfromtxt( "text.txt" , dtype=( int , "|S10", float))

A_array = np.asarray(A, dtype=object)

print A
print A_array

The file text.txt:

4 A 3750.5
4 B 3270.5
8 A 3480.5
8 B 3590.5

This is the output:

[(4, 'A', 3750.5) (4, 'B', 3270.5) (8, 'A', 3480.5) (8, 'B', 3590.5)]
[(4, 'A', 3750.5) (4, 'B', 3270.5) (8, 'A', 3480.5) (8, 'B', 3590.5)]

What am I missing here?

8
  • A is actually a list of tuples, not an array of lists. Commented Nov 15, 2013 at 21:09
  • Why did someone insert commas into A? If I print A there are no commas. Is it maybe a list of tuples? Commented Nov 15, 2013 at 21:26
  • 1
    You used A = ... implying that you're assigning the data. When you "print" data, you don't say A = ..., you just print the contents. When you assign to a variable with A = ... you need the value on the right-hand-side of the = symbol to evaluate to a valid Python object, which is not true without the commas. Commented Nov 15, 2013 at 21:29
  • I will try to insert commas as needed. I explained it the wrong way. The first matrix was the output of a print command. The question is updated. Sorry Commented Nov 15, 2013 at 21:36
  • Ok. Do you agree that A_array and A would print exactly the same? Commented Nov 15, 2013 at 21:39

2 Answers 2

4
In [24]: A_array = numpy.asarray(A, dtype=object)

In [25]: A_array[A_array[:,0] == 4]
Out[25]:
array([[4, A, 3750.5],
       [4, B, 3252.6]], dtype=object)

If the columns of data have semantic meaning that you'd like to keep track of, consider loading the list of tuples directly into a Pandas DataFrame and giving them column labels. The logical indexing would work similarly:

In [27]: A_df = pandas.DataFrame(A, columns=['Col1', 'Col2', 'Col3'])

In [28]: A_df
Out[28]:
   Col1 Col2    Col3
0     4    A  3750.5
1     4    B  3252.6
2     8    A  3350.5
3     8    B  3152.6

In [29]: A_df.Col1 == 4
Out[29]:
0     True
1     True
2    False
3    False
Name: Col1

In [30]: A_df[A_df.Col1 == 4]
Out[30]:
   Col1 Col2    Col3
0     4    A  3750.5
1     4    B  3252.6
Sign up to request clarification or add additional context in comments.

3 Comments

Hmm.. I tried your first solution. When I print A and A_array they appear exactly the same, and the indexing command on A_array fails as it does on A.
Your comment is ambiguous unless I can see the specific code you're executing. It works for me, and is indeed the standard solution for this in numpy.
Also, if A and A_array appear the same when you use print to print them out, you must have another error. The first, being just a Python list will print to the console very differently than the second, which is a numpy.ndarray. For instance, A_array will always some dtype field printed along with its actual data contents, whereas that doesn't exist for a list.
2

First, you need commas between the list elements in A, otherwise you'll get a syntax error:

A = [(4, 'A', 3750.5),
     (4, 'B', 3252.6),
     (8, 'A', 3350.5), 
     (8, 'B', 3152.6)]

Next, you can use a list comprehension to get what you want pretty succinctly:

[ row[2] for row in A if row[0] == 4 ]

Result:

[3750.5, 3252.6]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.