0

I have some difficulties with choosing a random row(point in my case) from my np array. I want to do that with probabilities for each point( so I have a P_i np array in which each row is the probability for a point). I tried to do it with np.random.choice and get "it's must be a 1-D array" so I did np.random.choice on the number of the rows so I get a random index of row. But how do I do it with a probability for each point?

2
  • 2
    Please read minimal reproducible example and how to ask. Show code and full error trace. Commented Jun 7, 2021 at 9:24
  • You can always .flatten() the array into 1D if your desired API doesn't support multidimensional arrays... Commented Jun 7, 2021 at 9:26

1 Answer 1

2

You can use np.choice with a probability distribution that sums up to 1.

Getting probabilities that sum up to 1

Reshaping

If your probablities already sum up to 1, then you simply want to squeeze your probability vector:

# Example of probability vector
probs = np.array([[0.1, 0.2, 0.5, 0.2]])
# array([[0.1, 0.2, 0.5, 0.2]])
probs.shape
# > (1, 4)
p_squeezed = probs.squeeze()
# > array([0.1, 0.2, 0.5, 0.2])
p_squeezed.shape
# > (4,)

Getting a proper probability distribution

If your own probs don't add up to 1, then you can apply a division by the sum or a softmax.

Just generating random data:

import numpy as np
# Random 2D points
points = np.random.randint(0,10, size=(10,2))
# random independant probabilities
probs = np.random.rand(10).reshape(-1, 1)
data = np.hstack((probs, points))
print(data)
# > array([[0.01402932, 5.        , 5.        ],
#          [0.01454579, 5.        , 6.        ],
#          [0.43927214, 1.        , 7.        ],
#          [0.36369286, 3.        , 7.        ],
#          [0.09703463, 9.        , 9.        ],
#          [0.56977406, 1.        , 4.        ],
#          [0.0453545 , 4.        , 2.        ],
#          [0.70413767, 4.        , 4.        ],
#          [0.72133774, 7.        , 1.        ],
#          [0.27297051, 3.        , 6.        ]])

Applying softmax:

from scipy.special import softmax
scale_softmax = softmax(data[:,0])
# > array([0.07077797, 0.07081454, 0.1082876 , 0.10040494, 0.07690364,
#  0.12338291, 0.0730302 , 0.14112644, 0.14357482, 0.09169694])

Applying division by the sum:

scale_divsum = data[: ,0] / data[:, 0].sum()
# > array([0.00432717, 0.00448646, 0.13548795, 0.11217647, 0.02992911,
#  0.17573962, 0.01398902, 0.21718238, 0.22248752, 0.08419431])

Here are the cumulative distributions of the scaling functions I proposed :

Cumulative distributions

Softmax makes it more similarly likely to pick any point than division by the sum, but the latter probably better fits your needs.

Picking a random row

Now you can use np.random.choice and give it your probability distribution to the parameter p:

rand_idx = np.random.choice(np.arange(len(data)), p=scale_softmax)
data[rand_idx]
# > array([0.70413767, 4.        , 4.        ])

# or just the point:
data[rand_idx, 1:]
# > array([4., 4.])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.