How to resample numpy array with max length of index in python

Question

I'm a newbie in python and trying to normalize each index in list using preprocessing.normalize. However, it gives me an error with ValueError: setting an array element with a sequence.

And then, I found what the problem was. It was because the length(size) of each index in np.array was different.

Here is my code,

result = []

for url in target_url :
    sensor = pd.read_csv(url, header=None, delimiter=r"\s+")
    result.append(sensor[2])

result = np.array(result)
# I want to resample here before it goes to normalize.
result = preprocessing.normalize(result, norm='l1')

I have target_url to get sensor data from webserver, and each appends to the result list. Then, it converts to array by using np.array

For example,

I have len(result[0]) has 121598 and len(result[1]) has 1215601. I want to make result[0] to be same length of result[1] using resample to fill NaN.

How can I do that?

Please help me out here.

Thanks in advance.

EDIT

After normalizing, I'm trying to do correlation using corr()

Here is the code,

result = preprocessing.normalize(result, norm='l1')
ret = pd.DataFrame(result)
corMat = DataFrame(ret.T.corr())

Gerges · Accepted Answer · 2017-09-28 05:48:08Z

1

Since you are using pandas to read csv, you are off to a good start. One way to do it is simply use pd.concat, to join the Series (I assume sensor[2] is a Series) in the result list into one DataFrame. This is an example:

a = [pd.Series([1, 2, 3]), pd.Series([1, 2]), pd.Series([1, 2, 3, 4])]
pd.concat(a, axis=1)

Which gives:

     0    1  2
0  1.0  1.0  1
1  2.0  2.0  2
2  3.0  NaN  3
3  NaN  NaN  4

In the example provided by OP, this should suffice:

result = []

for url in target_url :
    sensor = pd.read_csv(url, header=None, delimiter=r"\s+")
    result.append(sensor[2])

# concatenate Series, and do both forward and backward fill for NaNs 
result = pd.concat(result, axis=1).fillna(method='bfill').fillna(method='ffill')

result = preprocessing.normalize(result, norm='l1')

# correlation
pd.DataFrame(result).T.corr()

Depending on what the Series indices look like, and your application, you can do different types of concatenations. Here's the docs.

edited Sep 28, 2017 at 5:48

answered Sep 28, 2017 at 0:44

Gerges

6,6492 gold badges28 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

paulc1111 Over a year ago

Thanks for the answer! :) I'm actually trying to do correlation after normalizing. And I want to fill that NaN with bfill or ffill using resample. :'( Let me edit my question. :)

Gerges Over a year ago

Why not just use fillna(method='bfill')?

paulc1111 Over a year ago

Thanks for the comment and answer.. I have one quick question. After putting the code, it gives me like Error: AttributeError: 'numpy.ndarray' object has no attribute 'corr'

Gerges Over a year ago

Oh sorry my bad. You should cast to DataFrame to use pandas's cov or just use np.cov(result). I updated the answer.

paulc1111 Over a year ago

Thanks for the comment! :) it gives me an error but I figured out with using result_temp = [result.iloc[:,i].tolist() for i in range(0, lenghofColumn)] and then do correlation! :)

|

Collectives™ on Stack Overflow

How to resample numpy array with max length of index in python

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related