convert numpy array of tuples to 2D array

Question

I have a pandas DataFrame in which one of the columns is made of tuples of floats. When I use arr = df['col_name'].to_numpy(), I end up with a 1D array of tuples, but I need a 2D array of floats.

My solution so far is to use arr = np.array(df['col_name'].to_list()). This works, but it seems inefficient to convert first to a list and then to an array. So I'm wondering, is there a better way to do this?

This question is related, but the only answer there points to reading a text file differently, which is not an option for me since the data is already in the DataFrame.

Is the dtype object? The tolist step is probably fast. An alternative might be vstack — hpaulj
– hpaulj, Commented Dec 22, 2019 at 13:03
Yes, both df['col_name'].dtype and arr.dtype return dtype('O'). So I should stick to the current approach? — vbs
– vbs, Commented Dec 22, 2019 at 13:15
An object array used reference/pointers just like python lists. A pandas object dtype series also. So to_list should be pretty fast. — hpaulj
– hpaulj, Commented Dec 22, 2019 at 17:16

Valdi_Bo · Accepted Answer · 2019-12-22 16:09:27Z

0

If your col_name contains actual tuples then run:

pd.DataFrame(df['col_name'].apply(pd.Series))

But if you have read your DataFrame e.g. from a CSV file, then each element of col_name contains actually a string composed of:

an opening parenthesis,
a sequence of numbers (written as strings), separated with commas,
a closing parenthesis,

and it only looks like a tuple.

If this is the case, run:

pd.DataFrame(df['col_name'].apply(lambda txt: pd.Series(eval(txt))))

In both cases the result is a DataFrame. If you need, convert it to a Numpy array.

To check whether col_name contains strings or actual tuples, using Jupyter, run:

df.col_name.iloc[0]

If the result is '(2.15, 3.03, 4.07)' (surrounded with quotes) it is a string. But if you received (2.15, 3.03, 4.07) (without quotes) it is a tuple.

Another way to check is to run type(df.col_name.iloc[0]). You should get either tuple or str.

edited Dec 22, 2019 at 16:09

answered Dec 22, 2019 at 16:00

Valdi_Bo

31.1k4 gold badges29 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

vbs Over a year ago

Thank you, it works (I have actual tuples). So to get the numpy array the final call would be pd.DataFrame(df['col_name'].apply(pd.Series)).to_numpy(). To me this looks less readable than my current approach, but I guess the gain is in efficiency, since it's skipping the conversion to list, right?

hpaulj Over a year ago

pandas apply operations are generally slow, since it's actually a row iterator. This operation is making a new dataframe. timeit to be sure.

AMC Over a year ago

I agree with @hpaulj, this is entirely unnecessary. Doubly so when OP is concerned about the performance of .to_list().

Collectives™ on Stack Overflow

convert numpy array of tuples to 2D array

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related