Numpy random choice with probabilities to produce a 2D-array with unique rows

Question

Similar to Numpy random choice to produce a 2D-array with all unique values, I am looking for an efficient way of generating:

n = 1000
k = 10
number_of_combinations = 1000000

p = np.random.rand(n)
p /= np.sum(p)

my_combinations = np.random.choice(n, size=(number_of_combinations, k), replace=False, p=p)

As discussed in the previous question, I want this matrix to have only unique rows. Unfortunately, the provided solutions do not work for the additional extension of using specific probabilities p.

My current solution is as follows:

my_combinations = set()

while len(my_combinations) < number_of_combinations:
    new_combination = np.random.choice(n, size=k, replace=False, p=p)
    my_combinations.add(frozenset(new_combination))

print(my_combinations)

However, I do think that there should be a more efficient numpy approach to solve this faster.

hilberts_drinking_problem · Accepted Answer · 2019-08-08 09:51:40Z

2

For these parameter values, the probability of encountering a duplicate row is astronomically small (unless p is very skewed, perhaps to the extent that cannot be accommodated by float precision). I would just use

my_combinations = np.random.choice(n, size=number_of_combinations, k), replace=True, p=p)

You can check for duplicates in O(N log N) where N = number_of_combinations;

Conservatively, you could generate

my_combinations = np.random.choice(n, size=2 * number_of_combinations, k), replace=True, p=p)

then drop duplicates and take the first number_of_combinations rows.

edited Aug 8, 2019 at 9:51

answered Aug 8, 2019 at 9:34

hilberts_drinking_problem

11.6k3 gold badges25 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Numpy random choice with probabilities to produce a 2D-array with unique rows

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related