Newest 'sample' Questions

4 votes

8 answers

261 views

How to extract a given number of ordered rows from a given number of randomly selected samples?

Starting with the example dat0 below, let's say I want to randomly extract 3 pairs of ìd`s. Initial data (5 id pairs) dat0 <- structure(list(id = c("A", "A", "B", &...

denis

972

asked Nov 11 at 8:17

0 votes

0 answers

157 views

DateTimeException: [CANNOT_PARSE_TIMESTAMP] in PySpark dataframe without timestamp data

I have a PySpark DataFrame with two numeric columns (integers, negative and positive) and when I try to select a random sample of it, it generates an error. This is the code I'm trying to run: ...

Carlos Andrés Rodríguez

1

asked Jul 3 at 16:03

1 vote

1 answer

100 views

Random sample with multi-variate conditions

I have a dataset that I need to pull a random sample from. This is the data: {'character': {0: 'mario', 1: 'luigi', 2: 'yoshi', 3: 'peach', 4: 'bowser', 5: 'boo', 6: 'toad', 7: 'lakitu', ...

Tyler Moore

221

asked Apr 11 at 23:20

1 vote

1 answer

223 views

How to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame

I am trying to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame. However, I noticed that the sample function is producing the same set of IDs for each date no ...

pinpss

173

asked Apr 3 at 23:13

2 votes

3 answers

284 views

Stratified sampling using SQL given an absolute sample size

I have the following population: a b b c c c c I am looking for a SQL statement to generate a the stratified sample of arbitrary size. Let's say for this example, I would like a sample size of 4. I ...

Saqib Ali

4,551

asked Feb 4 at 5:08

2 votes

1 answer

254 views

How to sample Pandas DataFrame using a normal distribution by using random_state and numpy Generators

I am trying to write Pandas code that would allow me to sample DataFrame using a normal distribution. The most convinient way is to use random_state parameter of the sample method to draw random ...

pjercic

473

asked Jan 30 at 12:38

2 votes

1 answer

66 views

ADC7768 CRC calculation

I am using adc7768 to receive ADC samples. According to datasheet to calculate CRC, we can do it every 4th or 16th sample. My question is for 4 samples last 3 samples CRC will be header of 4th sample. ...

maxy

75

asked Jan 23 at 7:26

2 votes

0 answers

68 views

How to supply sample weights to Tensorflow dataset in R

I am trying to set up a dataset in R to run a neural network using TensorFlow, but I can't seem to figure out the right code to allow sample weights to be specified. The input array is image_data and ...

D_Taylor

23

asked Jan 3 at 4:57

4 votes

5 answers

282 views

Efficiently draw random samples without replacement from an array in python

I need to draw random samples without replacement from a 1D NumPy array. However, performance is critical since this operation will be repeated many times. Here’s the code I’m currently using: import ...

Mark

87

asked Dec 19, 2024 at 17:56

0 votes

3 answers

94 views

Why does Reservoir Sampling page in wikipedia say that the size of the list is unknown, but the source code function knows the size?

reservoir sampling on wikipedia: https://en.wikipedia.org/wiki/Reservoir_sampling Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of ...

kyopa

111

asked Nov 14, 2024 at 20:40

2 votes

1 answer

249 views

Resampling By Group in Polars [duplicate]

I'm trying to build a Monte Carlo simulator for my data in Polars. I am attempting to group by a column, resample the groups and then, unpack the aggregation lists back in their original sequence. I'...

nybhh

101

asked Nov 14, 2024 at 20:25

0 votes

0 answers

31 views

plots not generating all the samples and leave excess vertical space

I'm pretty new to machine learning. I was using fetch_olivetti_faces as my database for practice in my coding class. I ran the code, and it worked since I was following the teacher's instructions. ...

Jonathan Chan

1

asked Sep 26, 2024 at 16:13

1 vote

1 answer

224 views

I can't find /usr/local/cuda-<x>.<y>/gds/samples after I install cuda tookit and driver

I want to use GPUDirect Storage. I follow the instructions in https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#mofed-req-install to install it. The install details are as ...

xwt1

25

asked Sep 23, 2024 at 11:53

2 votes

1 answer

54 views

Generate weighted sample from weighted list

I have a list of historical frequencies of elements that have occurred together over time. These elements may have occurred (without repetition) in sequences of various order and length. For example, ...

pyll

1,754

asked Sep 11, 2024 at 12:47

1 vote

1 answer

216 views

Python Polars Sample N-1 by Group ID with Replacement

I am working on a bootstrapping project and need to sample M=N-1 observations with replacement where N is the number of unique observations in a specific group (defined by group_id). I need to figure ...

user432299

23

asked Sep 10, 2024 at 17:57

2 votes

2 answers

131 views

Dataframe - Select the minimum set of rows to cover all possible values of each columns

I'm working on a use-case on which I need to retrieve a minimal sample of rows from a dataframe that contains at least one row for each unique value found in all columns. A simplified example could be ...

JojoDolo

29

asked Aug 23, 2024 at 8:22

0 votes

1 answer

79 views

What is the time complexity of sample?

Using the default arguments, what is the time complexity of sample? I.e. how does the running time of sample(1:N) grow with N? Documentation for sample is here but does not specify time complexity.

Mohan

9,223

asked Aug 15, 2024 at 19:58

3 votes

1 answer

120 views

random.sample(population, X) sometimes not contained in random.sample(population, Y) when Y>X

EDIT: Edited original post replacing my code with an MRE from user @no comment. I noticed a seemingly non-intuitive behaviour using a seeded random.sample(population, k) call to sample from a list of ...

John Karkas

409

asked Jul 30, 2024 at 11:17

1 vote

1 answer

69 views

Mystery bug in sampling for loop in R

I am trying to understand what is causing this bug in my R code and I feel like R is gaslighting me. The sample() function seems to change depending on how I assign it? Anyways, here is the MRE: #...

ssm1020

45

asked May 30, 2024 at 15:23

0 votes

1 answer

30 views

Is there a way of creating multiple stratified samples at once?

Let say I have this input dataset with the Ids: a, b, c I need to order it by packages of +-100 rows each sample with the same distribution of Ids as the input entire population. What would be the ...

luisvenezian

511

asked May 28, 2024 at 14:09

0 votes

1 answer

58 views

Create sample weight in PySpark sampled dataframe

I have created a dataframe in PySpark as follows: df = spark.range(10) The dataframe looks like this: df.show() +---+ | id| +---+ | 0| | 1| | 2| | 3| | 4| | 5| | 6| | 7| | 8| | 9| +---+ I ...

Giampaolo Levorato

1,762

asked May 22, 2024 at 13:58

0 votes

2 answers

53 views

Groupwise replace NA's with sampled value from non NA's using dplyr

I have a dataframe with missing NA values in column X1 and a grouping variable group. I want to replace all NA values with a value sampled from the non-NA values of that group. This should be done for ...

Johannes

1,084

asked May 22, 2024 at 9:14

2 votes

1 answer

95 views

Select a random a sample from PyAthena SQL

I am in SageMaker Studio and I have connected to a dataset via PyAthena: from pyathena import connect s3_query_results = 'my s3 Location' region = 'eu-west-2' workgroup='primary' Then I have written ...

Giampaolo Levorato

1,762

asked May 22, 2024 at 7:39

1 vote

1 answer

50 views

Is there a function for sample selection based on conditions?

I am looking for an algorithm and its implementation in R for sample selection. I have a data.frame with i objects, and each object has j unique features. In parallel, I have > 100 samples k that ...

BHN

47

asked May 21, 2024 at 9:21

2 votes

3 answers

125 views

How to convert incomplete dates using sampled values from complete dates?

The data: I have a very large dataset. The following is a representative example in .csv. date, x, y "03082018", 304, 1 "071999", 305, 1 "04032018", 309, 2 "041997&...

hypnos

21

asked May 17, 2024 at 20:02

1 vote

1 answer

85 views

Most efficient way to conditionally sample my large df

I have a large DF (~35 million rows) and I am trying to create a new df by randomly sampling two rows from each unique cluster ID (~1.8 million unique cluster IDs)-- one row must have a label 0 and ...

youtube

504

asked Apr 29, 2024 at 17:13

0 votes

2 answers

65 views

Is there a way to balance data in R without reordering a dataframe?

First, here is some toy data: df <- data.frame( "stim" = c("face", "object", "pareidolia", "face", "face", "object", "...

thefriendly_plague.doctor

55

asked Apr 26, 2024 at 21:03

0 votes

1 answer

33 views

Problot list of lists with differents colors

I am working on some data which are supposed to follow a normal distribution. I have different sets of samples I'm interested in studying. So the data is stored in a list of lists (each sublist ...

Hélène BRNT

1

asked Apr 22, 2024 at 18:28

3 votes

3 answers

286 views

Random sampling from ordered data

In a simulation, we need ordered data which is a random sample (with or without replacement) of size m from a full data set of size n. Unfortunately, ordering the sampled data turns out to be a ...

cdalitz

1,371

asked Apr 16, 2024 at 8:41

0 votes

0 answers

57 views

Asset Collection Samples for TopBraid Composer Maestro Edition

I am writing a term paper for my university on TopBraid Composer Maestro Edition and I need the assets samples for the software and I am unable to find them online. The link on the bottom of the ...

Христијан Станојоски

1

asked Apr 9, 2024 at 8:11

-1 votes

1 answer

71 views

Selecting elements of vectors in a list

I have a list with 3 (or more) vectors of characters ordered by total of elements The first vector has 18 elements The second vector has 623 elements The third vector has 1706 elements I would like to ...

Wilson Souza

860

asked Apr 3, 2024 at 20:12

0 votes

1 answer

29 views

Does the print of readframes(n) of wave library show the audio samples in hexadecimal?

When I print the result of the first audio frame by doing track.readframes(1), I get b'\xfb\xff\xfb\xfe'. My track is stereo so we know that we have 2 channels where each channel has 1 sample. Does it ...

stabpaokara

5

asked Apr 3, 2024 at 16:17

1 vote

1 answer

223 views

Take a sample of the MNIST dataset

I am working with the MNIST dataset and performing different classification methods on it, but my runtimes are ridiculous, so I am looking for a way to maybe use an a portion of the training part of ...

Adam Rowland

13

asked Mar 28, 2024 at 18:55

0 votes

1 answer

43 views

unexpected sample behavior

I am currently running a small simulation and am irritated by the results. This is my code: ground_truth <- c("coke", "zero", "light", "zero") options <- ...

nhaus

1,043

asked Mar 14, 2024 at 15:50

1 vote

2 answers

206 views

Group dataframe and sample n rows with equal probability between groups

I have a pandas dataframe like this: ID Value 0 a 2 1 a 4 2 b 6 3 c 8 4 c 10 5 c 12 I would like to sample equally from the ID groups. I know I can ...

mnoerregaard

67

asked Mar 13, 2024 at 8:02

0 votes

1 answer

99 views

Drawing a random sample from a very large dataset

I have a csv dataset with 160MM rows that is not possible to import directly through Pandas (RAM memory is not enough). How could I draw a random sample of 5% from the original dataset (in this case, ...

Marcelo Fernandes

31

asked Mar 9, 2024 at 0:34

1 vote

0 answers

165 views

Sampling from a Normal distribution with sparse covariance matrix

To sample from a gaussian distribution with mean zero and covariance matrix S, we can do the following: from scipy import sparse from numpy import np S = sparse.diags([np.full(100,1),0.1*np.ones(99),0....

WeakLearner

938

asked Mar 7, 2024 at 20:19

1 vote

1 answer

53 views

How to randomly select the content of some cells into a data frame?

It seems simple but I cannot find the solution: I want to randomly select some elements into a data frame imported from a .xlsx file.Is there a function such as sample_n to do this? My problem lies in ...

gibarian

157

asked Feb 20, 2024 at 9:47

1 vote

0 answers

64 views

Starting Learning node.js - Their Simple Server Sample Gives Syntax Errors

Using the code sample found here, when I run node server.js in the Command Prompt I get syntax errors. First, it complains about =>. As far as I can see that's perfectly valid syntax, but if I ...

Nick Gris

174

asked Feb 18, 2024 at 12:08

-2 votes

1 answer

66 views

How to add samples in Asset Store?

I cannot add an instance to my asset in the Asset Store. I looked at the documentation but couldn't succeed. Where am I making a mistake? MyPackage ├── package.json └── Samples~ ├── ...

Atlas

1

asked Feb 16, 2024 at 19:48

0 votes

1 answer

227 views

spatSample() choosing NA cells even when na.rm = TRUE

I assume this is the cause of my error. I have a stack of 18 rasters which underwent the following preprocessing steps: Cropping to a specific region using a shapefile Trim to the extent of the ...

Barbara Perez de Araújo

161

asked Feb 8, 2024 at 11:13

0 votes

0 answers

56 views

Generating Samples from Customized Distribution - Stuck with Range Limitation

I'm struggling to generate samples from a custom distribution defined by its CDF but facing a persistent limitation. The issue boils down to: Limited Range of Samples: Despite having a theoretical ...

Qasim Ramzan

1

asked Jan 26, 2024 at 7:20

0 votes

0 answers

40 views

Retrieve a fixed number of random records from a Postgres database table

I have a table with partitioning by day. This table stores data for one month and there are about 3 billion of them, but many partitions in the table are empty. How can I optimally select exactly 5000 ...

Violetta

619

asked Jan 23, 2024 at 21:18

5 votes

0 answers

148 views

Video recorded using MediaRecorder Web API comes with audio sped up

I've been using the MediaRecorder Web API to record a MediaStream obtained using getUserMedia() to record video + audio coming from a webcam + microphone. getUserMedia is not called with specific ...

Daniel Limia

51

asked Jan 18, 2024 at 21:43

0 votes

0 answers

180 views

Drawing new data from KDE scikit-learn

I am fitting a Kernel Density Estimation instance on multi-variate data using scikit-learn implementation. As parameters I am using a 'gaussian' kernel and as bandwidth estimator 'silverman' .fit() ...

Edoardo Taccaliti

1

asked Jan 14, 2024 at 12:27

0 votes

1 answer

103 views

sample with row specific probability in a data.table

I'm trying to sample a dummy based on probabilities that are part of my data.table. If my data.table only has two rows, this works: library(data.table) playdata <- data.table(id = c("a",&...

Jakob

1,463

asked Jan 9, 2024 at 15:07

2 votes

1 answer

109 views

Sample rows from a dataframe by id when some ids have more rows than others

this is very basic but I couldn't find an answer online. I use R and have a dataset like this (but much larger): set.seed(123) id<-c(1,1,1,2,2,3,3,3,3,3,4,5,5,6,6,6) week<-c(1,2,3,1,2,1,2,3,4,5,...

Sointu

291

asked Jan 9, 2024 at 9:56

1 vote

1 answer

42 views

Randomly sort specific number of elements from columns of different lengths

I have a tibble_data.frame of dimensions 1042x64. Columns are amphibian families and rows are the names of all species in that family. The first 5 rows and 2 columns look like this: > amphilist[1:5,...

feprocha93

13

asked Jan 9, 2024 at 6:34

0 votes

1 answer

397 views

Excel: Proportional monthly distribution of sample values over a time period spanning different months

I am trying to get my head around an MS excel formula for a scenario I have. I am recording discharges over various sample periods. These sample periods will last anywhere from 24 hrs to 1 week. The ...

ETP

33

asked Jan 4, 2024 at 13:30

2 votes

1 answer

696 views

How to Calculate a 2D Empirical CDF via histogram2d

I am trying to obtain a matrix representation of an empirical 2 dimensional CDF given two data samples of the same size. I have two sorted data samples of the same size: sorted_sample1 and ...

Alexandre Bloch

43

asked Dec 11, 2023 at 17:53

Collectives™ on Stack Overflow