0

I have a numpy array that I want to alter by scaling all of the columns (e.g. all the values in a column are divided by the maximum value in that column so that all values are <1).

A sample output of the array is

[ 2. 0. 367.877 ..., -0.358 51.547 -32.633]

[ 2. 0. 339.824 ..., -0.33 52.562 -27.581]

[ 3. 0. 371.438 ..., -0.406 55.108 -35.573]

I've tried scaling the array (data_in) by the following code:

#normalize the data_in array 
data_in_normalized = data_in / data_in.max(axis=0)

However, the output of data_in_normalized is:

[ 0.5 0. 0.95437199 0.89363654 0.80751792 ]

[ 0.46931238 0.50660904 0.5003812 0.91250444 0.625 ]

[ 0.96229214 0.89483109 0.86989432 0.86491407 0.71287646 ]

[ -23.90909091 0.34346373 1.25110652 0. 0.8537859 1. 1.]

Clearly, it didn't normalize--there are multiple areas where the maximum value is >1. Is there a better way to scale the data, or am I using the max() function incorrectly (e.g. is the max() value being shared between columns?)

1 Answer 1

2

IIUC, it's not that the maximum value is shared between columns, it's that you probably want to divide by the maximum absolute value instead, because you have elements of both signs. 1 > -100, after all, and so if you divide by the maximum value of a column with [1, -100], nothing would change.

For example:

>>> data_in = np.array([[-3,-2],[2,1]])
>>> data_in
array([[-3, -2],
       [ 2,  1]])
>>> data_in.max(axis=0)
array([2, 1])
>>> data_in / data_in.max(axis=0)
array([[-1.5, -2. ],
       [ 1. ,  1. ]])

but

>>> data_in / np.abs(data_in).max(axis=0)
array([[-1.        , -1.        ],
       [ 0.66666667,  0.5       ]])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.