2

Similar to Numpy random choice to produce a 2D-array with all unique values, I am looking for an efficient way of generating:

n = 1000
k = 10
number_of_combinations = 1000000

p = np.random.rand(n)
p /= np.sum(p)

my_combinations = np.random.choice(n, size=(number_of_combinations, k), replace=False, p=p)

As discussed in the previous question, I want this matrix to have only unique rows. Unfortunately, the provided solutions do not work for the additional extension of using specific probabilities p.

My current solution is as follows:

my_combinations = set()

while len(my_combinations) < number_of_combinations:
    new_combination = np.random.choice(n, size=k, replace=False, p=p)
    my_combinations.add(frozenset(new_combination))

print(my_combinations)

However, I do think that there should be a more efficient numpy approach to solve this faster.

1 Answer 1

2

For these parameter values, the probability of encountering a duplicate row is astronomically small (unless p is very skewed, perhaps to the extent that cannot be accommodated by float precision). I would just use

my_combinations = np.random.choice(n, size=number_of_combinations, k), replace=True, p=p)

You can check for duplicates in O(N log N) where N = number_of_combinations;

Conservatively, you could generate

my_combinations = np.random.choice(n, size=2 * number_of_combinations, k), replace=True, p=p)

then drop duplicates and take the first number_of_combinations rows.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.