0

I have a Numpy object with random N*M elements, and I also have two numbers A and B.

Now I want to access every element in this N*M array and make a change, i.e., if the element > 0, replace this element to A (i.e., element <- A), and if this element < 0, replace this element to B (i.e., element <- B).

I know there is a naive way to implement this method, that is accessing every single element using for loop, but it is very slow.

Can we use more fancy code to implement this ?

2
  • Try numpy.clip. Commented Jul 6, 2017 at 17:21
  • Look at np.where Commented Jul 6, 2017 at 17:23

3 Answers 3

2

Boolean masked assignment will change values in place:

In [493]: arr = np.random.randint(-10,10,(5,7))
In [494]: arr
Out[494]: 
array([[ -5,  -6,  -7,  -1,  -8,  -8, -10],
       [ -9,   1,  -3,  -9,   3,   8,  -1],
       [  6,  -7,   4,   0,  -4,   4,  -2],
       [ -3, -10,  -2,   7,  -4,   2,   2],
       [ -5,   5,  -1,  -7,   7,   5,  -7]])
In [495]: arr[arr>0] = 100
In [496]: arr[arr<0] = -50
In [497]: arr
Out[497]: 
array([[-50, -50, -50, -50, -50, -50, -50],
       [-50, 100, -50, -50, 100, 100, -50],
       [100, -50, 100,   0, -50, 100, -50],
       [-50, -50, -50, 100, -50, 100, 100],
       [-50, 100, -50, -50, 100, 100, -50]])

I just gave a similar answer in

python numpy: iterate for different conditions without using a loop

Sign up to request clarification or add additional context in comments.

Comments

1

IIUC:

narr = np.random.randint(-100,100,(10,5))
array([[ 70, -20,  96,  73, -94],
       [ 42,  35, -55,  56,  54],
       [ 97, -16,  24,  32,  78],
       [ 49,  49, -11, -82,  82],
       [-10,  59, -42, -68, -70],
       [ 95,  23,  22,  58, -38],
       [ -2, -64,  27, -33, -95],
       [ 98,  42,   8, -83,  85],
       [ 23,  51, -99, -82,  -7],
       [-28, -11, -44,  95,  93]])
A = 1000
B = -999

Use np.where:

np.where(narr > 0, A, np.where(narr < 0, B , narr))

Output:

array([[1000, -999, 1000, 1000, -999],
       [1000, 1000, -999, 1000, 1000],
       [1000, -999, 1000, 1000, 1000],
       [1000, 1000, -999, -999, 1000],
       [-999, 1000, -999, -999, -999],
       [1000, 1000, 1000, 1000, -999],
       [-999, -999, 1000, -999, -999],
       [1000, 1000, 1000, -999, 1000],
       [1000, 1000, -999, -999, -999],
       [-999, -999, -999, 1000, 1000]])

1 Comment

This is helpful ! But the original array narr has not been changed, so I need to allocate new memory to hold this new array. Can we do this change in place?
0

Because you mentioned that you're interested in the speed of the computation, I made a speed comparision of several different approaches for your problem.

test.py:

import numpy as np

A = 100
B = 50

def createArray():
  array = np.random.randint(-100,100,(500,500))
  return array

def replace(x):
    return A if x > 0 else B

def replace_ForLoop():
    """Simple for-loop."""
    array = createArray()
    for i in range(array.shape[0]):
        for j in range(array.shape[1]):
            array[i][j] = replace(array[i][j])

def replace_nditer():
    """Use numpy.nditer to iterate over values."""
    array = createArray()
    for elem in np.nditer(array, op_flags=['readwrite']):
        elem[...] = replace(elem)

def replace_masks():
    """Use boolean masks."""
    array = createArray()
    array[array>0] = A
    array[array<0] = B

def replace_vectorize():
    """Use numpy.vectorize"""
    array = createArray()
    vectorfunc = np.vectorize(replace)
    array = vectorfunc(array)

def replace_where():
    """Use numpy.where"""
    array = createArray()
    array = np.where(array > 0, A, np.where(array < 0, B , array))

Note: The variants using nested for-loops, np.nditer and boolean masks work inplace, the last two do not.

Timing comparision:

> python -mtimeit -s'import test' 'test.replace_ForLoop()'                     
10 loops, best of 3: 185 msec per loop
> python -mtimeit -s'import test' 'test.replace_nditer()' 
10 loops, best of 3: 294 msec per loop
> python -mtimeit -s'import test' 'test.replace_masks()' 
100 loops, best of 3: 5.8 msec per loop
> python -mtimeit -s'import test' 'test.replace_vectorize()'
10 loops, best of 3: 55.3 msec per loop
> python -mtimeit -s'import test' 'test.replace_where()'    
100 loops, best of 3: 5.42 msec per loop

Using loops is indeed quite slow. numpy.nditer is even slower, which comes as a surprise to me, because the doc calls it an efficient multi-dimensional iterator object to iterate over arrays. numpy.vectorize is essentially a for-loop, but still manages to be thrice as fast as the naive implementation. The np.where variant proposed by Scott Boston is slightly faster than using boolean masks as per hpaulj's answer. However, it does need more memory because it does not modify inplace.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.