6

I want to convert a pandas Series of strings of list of numbers into a numpy array. What I have is something like:

ds = pd.Series(['[1 -2 0 1.2 4.34]', '[3.3 4 0 -1 9.1]'])

My desired output:

arr = np.array([[1, -2, 0, 1.2, 4.34], [3.3, 4, 0, -1, 9.1]])

What I have done so far is to convert the pandas Series to a Series of a list of numbers as:

ds1 = ds.apply(lambda x: [float(number) for number in x.strip('[]').split(' ')])

but I don't know how to go from ds1 to arr.

2
  • Are you guaranteed that the lists in the series have the same number of elements? Commented Aug 20, 2020 at 12:49
  • @FBruzzesi Yes. Commented Aug 20, 2020 at 12:50

2 Answers 2

6

Use Series.str.strip + Series.str.split and create a new np.array with dtype=float:

arr = np.array(ds.str.strip('[]').str.split().tolist(), dtype='float')

Result:

print(arr)

array([[ 1.  , -2.  ,  0.  ,  1.2 ,  4.34],
       [ 3.3 ,  4.  ,  0.  , -1.  ,  9.1 ]])
Sign up to request clarification or add additional context in comments.

Comments

1

You can try to remove the "[]" from the Series object first, then things will become easier, https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html.

ds1 = ds.str.strip("[]")
# split and exapand the data, conver to numpy array
arr = ds1.str.split(" ", expand=True).to_numpy(dtype=float)

Then arr will be the right format you want,

array([[ 1.  , -2.  ,  0.  ,  1.2 ,  4.34],
       [ 3.3 ,  4.  ,  0.  , -1.  ,  9.1 ]])

Then I did a little profiling in comparison with Shubham's colution.

# Shubham's way
%timeit arr = np.array(ds.str.strip('[]').str.split().tolist(), dtype='float')
332 µs ± 5.72 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# my way
%timeit ds.str.strip("[]").str.split(" ", expand=True).to_numpy(dtype=float)
741 µs ± 4.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Obviously, his solution is much faster! Cheers!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.