Effective way to get array from pandas dataframe text column

Question

Can I do the following conversion to an array, using constructs like df.col.apply(lambda x ... , without using 'traditional' for-loops (one iterating over the columns and another iterating over words within each column's string value)?

All my attempts gave error messages like The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

Example:

d = {'foo' : [1,2,3], 'bar': [-2,-2,-3]}
df = pd.DataFrame({'col': ['foo mur bar','foo','mur mur']}, index=[1,2,3])

Expected output is:

np.array([
    [[1,2,3],[-2,-2,-3]],
    [[1,2,3]],
    [[]]
])

Yes. There are 385 existing results for [python] create word vector, please search through them. This is a duplicate. — smci
– smci, Commented Oct 30, 2019 at 8:01
Possible duplicate of Convert pandas dataframe to NumPy array — Ahmad
– Ahmad, Commented Oct 30, 2019 at 8:04

U13-Forward · Accepted Answer · 2019-10-30 08:06:12Z

1

Try using:

a = df['col'].str.split().apply(lambda x: pd.Series(x).map(d)).values
a = np.array([pd.Series(i).dropna().values for i in a])
print(a)

Output:

[array([[1, 2, 3], [-2, -2, -3]], dtype=object)
 array([[1, 2, 3]], dtype=object) array([], dtype=object)]

answered Oct 30, 2019 at 8:06

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MaratSR Over a year ago

thank you. I could do it by np.array(df1.col.apply(lambda x: np.array([d[item] for item in x.split(' ') if d.get(item, 0) != 0]))) but you way more quickly imho.

Collectives™ on Stack Overflow

Effective way to get array from pandas dataframe text column

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related