Efficiently draw random samples without replacement from an array in python

Question

I need to draw random samples without replacement from a 1D NumPy array. However, performance is critical since this operation will be repeated many times.

Here’s the code I’m currently using:

import numpy as np

# Example array
array = np.array([10, 20, 30, 40, 50])

# Number of samples to draw
num_samples = 3

# Draw samples without replacement
samples = np.random.choice(array, size=num_samples, replace=False)

print("Samples:", samples)

While this works for one sample, it requires a loop to generate multiple samples, and I believe there could be a way to optimize or vectorize this operation to improve performance when sampling multiple times.

Is there a way to vectorize or otherwise optimize this operation?
Would another library (e.g., TensorFlow, PyTorch) provide better performance for this task?
Are there specific techniques for bulk sampling that avoid looping in Python?

after your first sample, do you start with the full array again or is the next sample done without replacing the prior sampled items? — JonSG
– JonSG, Commented Dec 19, 2024 at 18:12
Do you need the choice method to draw a fixed set of choices or would it also be ok to draw a larger set in one chunk? — JE_Muc
– JE_Muc, Commented Dec 19, 2024 at 20:18
You are not the first to ask for this, e.g. stackoverflow.com/questions/40002821/… — Warren Weckesser
– Warren Weckesser, Commented Dec 19, 2024 at 20:21
Keep in mind that numpy 'vectorization' just means using compiled functions to work with whole arrays. It moves the loop(s) into compiled code Your use of 'choice' does that for one num_samples sample But from one sample to the next you are starting over, right? — hpaulj
– hpaulj, Commented Dec 19, 2024 at 20:40

JE_Muc · Accepted Answer · 2024-12-19 18:47:01Z

2

First you should use the choice method of a Generator instance, see here. This will increase the performance substantially, according to this post (if this is still up to date):

rng = np.random.default_rng()

samples = rng.choice(array, size=num_samples, replace=False)

answered Dec 19, 2024 at 18:47

JE_Muc

5,8323 gold badges30 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Learning is a mess Dec 20, 2024 at 9:55

Great answer, if I can add something, using a generator also allows you to set the seed: np.random.RandomState(seed=1337).choice( array, size=num_samples, replace=False).

Mark · Accepted Answer · 2024-12-22 09:25:25Z

0

This operation could be sped up using Numba:

import numpy as np
import numba as nb

@nb.jit(nopython=True, fastmath=True)
def draw_random_samples(len_deck, n_simulations, n_cards):

    deck_indices = np.arange(len_deck)
    simulations = [np.random.choice(deck_indices, n_cards, replace=False) 
                   for i in range(n_simulations)]
    
    return simulations

answered Dec 22, 2024 at 9:25

Mark

876 bronze badges

Comments

Mark · Accepted Answer · 2024-12-22 09:40:26Z

0

This might do the trick:

import numpy as np

def draw_random_samples(len_deck, n_simulations, n_cards):
    
    values = np.random.random((n_simulations, len_deck))
    indices = np.argpartition(values, n_cards)
    indices = indices[...,:n_cards]
    
    return indices

edited Dec 22, 2024 at 9:40

answered Dec 22, 2024 at 8:54

Mark

876 bronze badges

Comments

Mark · Accepted Answer · 2024-12-23 09:45:34Z

The code below generates random samples of a list without replacement in a vectorized manner. This solution is particularly useful when the number of simulations is large and the number of samples per simulation is low.

import numpy as np

def draw_random_samples(len_deck, n_simulations, n_cards):
    
    """
    Draw random samples from the deck.
    
    Parameters
    ----------
    len_deck : int
        Length of the deck.
    n_simulations : int
        How many combinations of cards are generated. (Doubles could occur.)
    n_cards : int
        How many cards to draw from the deck per simulation. (All cards are unique.)
    
    Returns
    -------
    indices : array-like
        Random indices of the deck. 
    
    """
    
    indices = np.random.randint(0, len_deck, (1, n_simulations))

    for i in range(1, n_cards):
        new_indices = np.random.randint(0, len_deck-i, n_simulations)
        new_indices += np.sum(new_indices >= indices - np.arange(i)[:,None], axis=0)

        indices = np.vstack((indices, new_indices))
        indices = np.sort(indices, axis=0)
    
    return indices.T

Adrian Mole · Accepted Answer · 2024-12-20 07:00:35Z

-1

The sample() is an inbuilt method of the random module, which takes the sequence and number of selections as arguments and returns a particular length list of items chosen from the sequence i.e. list, tuple, string or set.

# importing the required module
import random
 
# list of items
List = [10, 20, 30, 40, 50, 40,
        30, 20, 10]
 
# using the sample() method
UpdatedList = random.sample(List, 3)
 
# displaying random selections from 
# the list without repetition
print(List)

edited Dec 20, 2024 at 7:00

Adrian Mole

52.2k193 gold badges61 silver badges101 bronze badges

answered Dec 19, 2024 at 18:40

Avdesh Paliwal

9

2 Comments

Jeremy Caney Dec 20, 2024 at 1:40

Please edit your answer to remove the description from the code block. This can be done by unindenting the description.

hpaulj Dec 20, 2024 at 1:52

In quick testing, this use of random is faster than the numpy equivalents.

Collectives™ on Stack Overflow

Efficiently draw random samples without replacement from an array in python

5 Answers 5

1 Comment

Comments

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related