4

I need to draw random samples without replacement from a 1D NumPy array. However, performance is critical since this operation will be repeated many times.

Here’s the code I’m currently using:

import numpy as np

# Example array
array = np.array([10, 20, 30, 40, 50])

# Number of samples to draw
num_samples = 3

# Draw samples without replacement
samples = np.random.choice(array, size=num_samples, replace=False)

print("Samples:", samples)

While this works for one sample, it requires a loop to generate multiple samples, and I believe there could be a way to optimize or vectorize this operation to improve performance when sampling multiple times.

  • Is there a way to vectorize or otherwise optimize this operation?
  • Would another library (e.g., TensorFlow, PyTorch) provide better performance for this task?
  • Are there specific techniques for bulk sampling that avoid looping in Python?
10
  • 1
    after your first sample, do you start with the full array again or is the next sample done without replacing the prior sampled items? Commented Dec 19, 2024 at 18:12
  • 1
    After the first sample, I start with the full array again. Commented Dec 19, 2024 at 18:19
  • Do you need the choice method to draw a fixed set of choices or would it also be ok to draw a larger set in one chunk? Commented Dec 19, 2024 at 20:18
  • You are not the first to ask for this, e.g. stackoverflow.com/questions/40002821/… Commented Dec 19, 2024 at 20:21
  • Keep in mind that numpy 'vectorization' just means using compiled functions to work with whole arrays. It moves the loop(s) into compiled code Your use of 'choice' does that for one num_samples sample But from one sample to the next you are starting over, right? Commented Dec 19, 2024 at 20:40

5 Answers 5

2

First you should use the choice method of a Generator instance, see here. This will increase the performance substantially, according to this post (if this is still up to date):

rng = np.random.default_rng()

samples = rng.choice(array, size=num_samples, replace=False)
Sign up to request clarification or add additional context in comments.

1 Comment

Great answer, if I can add something, using a generator also allows you to set the seed: np.random.RandomState(seed=1337).choice( array, size=num_samples, replace=False).
0

This operation could be sped up using Numba:

import numpy as np
import numba as nb

@nb.jit(nopython=True, fastmath=True)
def draw_random_samples(len_deck, n_simulations, n_cards):

    deck_indices = np.arange(len_deck)
    simulations = [np.random.choice(deck_indices, n_cards, replace=False) 
                   for i in range(n_simulations)]
    
    return simulations

Comments

0

This might do the trick:

import numpy as np

def draw_random_samples(len_deck, n_simulations, n_cards):
    
    values = np.random.random((n_simulations, len_deck))
    indices = np.argpartition(values, n_cards)
    indices = indices[...,:n_cards]
    
    return indices

Comments

0

The code below generates random samples of a list without replacement in a vectorized manner. This solution is particularly useful when the number of simulations is large and the number of samples per simulation is low.

import numpy as np

def draw_random_samples(len_deck, n_simulations, n_cards):
    
    """
    Draw random samples from the deck.
    
    Parameters
    ----------
    len_deck : int
        Length of the deck.
    n_simulations : int
        How many combinations of cards are generated. (Doubles could occur.)
    n_cards : int
        How many cards to draw from the deck per simulation. (All cards are unique.)
    
    Returns
    -------
    indices : array-like
        Random indices of the deck. 
    
    """
    
    indices = np.random.randint(0, len_deck, (1, n_simulations))

    for i in range(1, n_cards):
        new_indices = np.random.randint(0, len_deck-i, n_simulations)
        new_indices += np.sum(new_indices >= indices - np.arange(i)[:,None], axis=0)

        indices = np.vstack((indices, new_indices))
        indices = np.sort(indices, axis=0)
    
    return indices.T

Comments

-1

The sample() is an inbuilt method of the random module, which takes the sequence and number of selections as arguments and returns a particular length list of items chosen from the sequence i.e. list, tuple, string or set.

# importing the required module
import random
 
# list of items
List = [10, 20, 30, 40, 50, 40,
        30, 20, 10]
 
# using the sample() method
UpdatedList = random.sample(List, 3)
 
# displaying random selections from 
# the list without repetition
print(List)

2 Comments

Please edit your answer to remove the description from the code block. This can be done by unindenting the description.
In quick testing, this use of random is faster than the numpy equivalents.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.