Skip to main content
Filter by
Sorted by
Tagged with
4 votes
8 answers
261 views

Starting with the example dat0 below, let's say I want to randomly extract 3 pairs of ìd`s. Initial data (5 id pairs) dat0 <- structure(list(id = c("A", "A", "B", &...
denis's user avatar
  • 972
0 votes
0 answers
157 views

I have a PySpark DataFrame with two numeric columns (integers, negative and positive) and when I try to select a random sample of it, it generates an error. This is the code I'm trying to run: ...
Carlos Andrés Rodríguez's user avatar
1 vote
1 answer
100 views

I have a dataset that I need to pull a random sample from. This is the data: {'character': {0: 'mario', 1: 'luigi', 2: 'yoshi', 3: 'peach', 4: 'bowser', 5: 'boo', 6: 'toad', 7: 'lakitu', ...
Tyler Moore's user avatar
1 vote
1 answer
223 views

I am trying to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame. However, I noticed that the sample function is producing the same set of IDs for each date no ...
pinpss's user avatar
  • 173
2 votes
3 answers
284 views

I have the following population: a b b c c c c I am looking for a SQL statement to generate a the stratified sample of arbitrary size. Let's say for this example, I would like a sample size of 4. I ...
Saqib Ali's user avatar
  • 4,551
2 votes
1 answer
254 views

I am trying to write Pandas code that would allow me to sample DataFrame using a normal distribution. The most convinient way is to use random_state parameter of the sample method to draw random ...
pjercic's user avatar
  • 473
2 votes
1 answer
66 views

I am using adc7768 to receive ADC samples. According to datasheet to calculate CRC, we can do it every 4th or 16th sample. My question is for 4 samples last 3 samples CRC will be header of 4th sample. ...
maxy's user avatar
  • 75
2 votes
0 answers
68 views

I am trying to set up a dataset in R to run a neural network using TensorFlow, but I can't seem to figure out the right code to allow sample weights to be specified. The input array is image_data and ...
D_Taylor's user avatar
4 votes
5 answers
282 views

I need to draw random samples without replacement from a 1D NumPy array. However, performance is critical since this operation will be repeated many times. Here’s the code I’m currently using: import ...
Mark's user avatar
  • 87
0 votes
3 answers
94 views

reservoir sampling on wikipedia: https://en.wikipedia.org/wiki/Reservoir_sampling Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of ...
kyopa's user avatar
  • 111
2 votes
1 answer
249 views

I'm trying to build a Monte Carlo simulator for my data in Polars. I am attempting to group by a column, resample the groups and then, unpack the aggregation lists back in their original sequence. I'...
nybhh's user avatar
  • 101
0 votes
0 answers
31 views

I'm pretty new to machine learning. I was using fetch_olivetti_faces as my database for practice in my coding class. I ran the code, and it worked since I was following the teacher's instructions. ...
Jonathan Chan's user avatar
1 vote
1 answer
224 views

I want to use GPUDirect Storage. I follow the instructions in https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#mofed-req-install to install it. The install details are as ...
xwt1's user avatar
  • 25
2 votes
1 answer
54 views

I have a list of historical frequencies of elements that have occurred together over time. These elements may have occurred (without repetition) in sequences of various order and length. For example, ...
pyll's user avatar
  • 1,754
1 vote
1 answer
216 views

I am working on a bootstrapping project and need to sample M=N-1 observations with replacement where N is the number of unique observations in a specific group (defined by group_id). I need to figure ...
user432299's user avatar
2 votes
2 answers
131 views

I'm working on a use-case on which I need to retrieve a minimal sample of rows from a dataframe that contains at least one row for each unique value found in all columns. A simplified example could be ...
JojoDolo's user avatar
0 votes
1 answer
79 views

Using the default arguments, what is the time complexity of sample? I.e. how does the running time of sample(1:N) grow with N? Documentation for sample is here but does not specify time complexity.
Mohan's user avatar
  • 9,223
3 votes
1 answer
120 views

EDIT: Edited original post replacing my code with an MRE from user @no comment. I noticed a seemingly non-intuitive behaviour using a seeded random.sample(population, k) call to sample from a list of ...
John Karkas's user avatar
1 vote
1 answer
69 views

I am trying to understand what is causing this bug in my R code and I feel like R is gaslighting me. The sample() function seems to change depending on how I assign it? Anyways, here is the MRE: #...
ssm1020's user avatar
  • 45
0 votes
1 answer
30 views

Let say I have this input dataset with the Ids: a, b, c I need to order it by packages of +-100 rows each sample with the same distribution of Ids as the input entire population. What would be the ...
luisvenezian's user avatar
0 votes
1 answer
58 views

I have created a dataframe in PySpark as follows: df = spark.range(10) The dataframe looks like this: df.show() +---+ | id| +---+ | 0| | 1| | 2| | 3| | 4| | 5| | 6| | 7| | 8| | 9| +---+ I ...
Giampaolo Levorato's user avatar
0 votes
2 answers
53 views

I have a dataframe with missing NA values in column X1 and a grouping variable group. I want to replace all NA values with a value sampled from the non-NA values of that group. This should be done for ...
Johannes's user avatar
  • 1,084
2 votes
1 answer
95 views

I am in SageMaker Studio and I have connected to a dataset via PyAthena: from pyathena import connect s3_query_results = 'my s3 Location' region = 'eu-west-2' workgroup='primary' Then I have written ...
Giampaolo Levorato's user avatar
1 vote
1 answer
50 views

I am looking for an algorithm and its implementation in R for sample selection. I have a data.frame with i objects, and each object has j unique features. In parallel, I have > 100 samples k that ...
BHN's user avatar
  • 47
2 votes
3 answers
125 views

The data: I have a very large dataset. The following is a representative example in .csv. date, x, y "03082018", 304, 1 "071999", 305, 1 "04032018", 309, 2 "041997&...
hypnos's user avatar
  • 21
1 vote
1 answer
85 views

I have a large DF (~35 million rows) and I am trying to create a new df by randomly sampling two rows from each unique cluster ID (~1.8 million unique cluster IDs)-- one row must have a label 0 and ...
youtube's user avatar
  • 504
0 votes
2 answers
65 views

First, here is some toy data: df <- data.frame( "stim" = c("face", "object", "pareidolia", "face", "face", "object", "...
thefriendly_plague.doctor's user avatar
0 votes
1 answer
33 views

I am working on some data which are supposed to follow a normal distribution. I have different sets of samples I'm interested in studying. So the data is stored in a list of lists (each sublist ...
Hélène BRNT's user avatar
3 votes
3 answers
286 views

In a simulation, we need ordered data which is a random sample (with or without replacement) of size m from a full data set of size n. Unfortunately, ordering the sampled data turns out to be a ...
cdalitz's user avatar
  • 1,371
0 votes
0 answers
57 views

I am writing a term paper for my university on TopBraid Composer Maestro Edition and I need the assets samples for the software and I am unable to find them online. The link on the bottom of the ...
Христијан Станојоски's user avatar
-1 votes
1 answer
71 views

I have a list with 3 (or more) vectors of characters ordered by total of elements The first vector has 18 elements The second vector has 623 elements The third vector has 1706 elements I would like to ...
Wilson Souza's user avatar
0 votes
1 answer
29 views

When I print the result of the first audio frame by doing track.readframes(1), I get b'\xfb\xff\xfb\xfe'. My track is stereo so we know that we have 2 channels where each channel has 1 sample. Does it ...
stabpaokara's user avatar
1 vote
1 answer
223 views

I am working with the MNIST dataset and performing different classification methods on it, but my runtimes are ridiculous, so I am looking for a way to maybe use an a portion of the training part of ...
Adam Rowland's user avatar
0 votes
1 answer
43 views

I am currently running a small simulation and am irritated by the results. This is my code: ground_truth <- c("coke", "zero", "light", "zero") options <- ...
nhaus's user avatar
  • 1,043
1 vote
2 answers
206 views

I have a pandas dataframe like this: ID Value 0 a 2 1 a 4 2 b 6 3 c 8 4 c 10 5 c 12 I would like to sample equally from the ID groups. I know I can ...
mnoerregaard's user avatar
0 votes
1 answer
99 views

I have a csv dataset with 160MM rows that is not possible to import directly through Pandas (RAM memory is not enough). How could I draw a random sample of 5% from the original dataset (in this case, ...
Marcelo Fernandes's user avatar
1 vote
0 answers
165 views

To sample from a gaussian distribution with mean zero and covariance matrix S, we can do the following: from scipy import sparse from numpy import np S = sparse.diags([np.full(100,1),0.1*np.ones(99),0....
WeakLearner's user avatar
1 vote
1 answer
53 views

It seems simple but I cannot find the solution: I want to randomly select some elements into a data frame imported from a .xlsx file.Is there a function such as sample_n to do this? My problem lies in ...
gibarian's user avatar
  • 157
1 vote
0 answers
64 views

Using the code sample found here, when I run node server.js in the Command Prompt I get syntax errors. First, it complains about =>. As far as I can see that's perfectly valid syntax, but if I ...
Nick Gris's user avatar
  • 174
-2 votes
1 answer
66 views

I cannot add an instance to my asset in the Asset Store. I looked at the documentation but couldn't succeed. Where am I making a mistake? MyPackage ├── package.json └── Samples~ ├── ...
Atlas's user avatar
  • 1
0 votes
1 answer
227 views

I assume this is the cause of my error. I have a stack of 18 rasters which underwent the following preprocessing steps: Cropping to a specific region using a shapefile Trim to the extent of the ...
Barbara Perez de Araújo's user avatar
0 votes
0 answers
56 views

I'm struggling to generate samples from a custom distribution defined by its CDF but facing a persistent limitation. The issue boils down to: Limited Range of Samples: Despite having a theoretical ...
Qasim Ramzan's user avatar
0 votes
0 answers
40 views

I have a table with partitioning by day. This table stores data for one month and there are about 3 billion of them, but many partitions in the table are empty. How can I optimally select exactly 5000 ...
Violetta's user avatar
  • 619
5 votes
0 answers
148 views

I've been using the MediaRecorder Web API to record a MediaStream obtained using getUserMedia() to record video + audio coming from a webcam + microphone. getUserMedia is not called with specific ...
Daniel Limia's user avatar
0 votes
0 answers
180 views

I am fitting a Kernel Density Estimation instance on multi-variate data using scikit-learn implementation. As parameters I am using a 'gaussian' kernel and as bandwidth estimator 'silverman' .fit() ...
Edoardo Taccaliti's user avatar
0 votes
1 answer
103 views

I'm trying to sample a dummy based on probabilities that are part of my data.table. If my data.table only has two rows, this works: library(data.table) playdata <- data.table(id = c("a",&...
Jakob's user avatar
  • 1,463
2 votes
1 answer
109 views

this is very basic but I couldn't find an answer online. I use R and have a dataset like this (but much larger): set.seed(123) id<-c(1,1,1,2,2,3,3,3,3,3,4,5,5,6,6,6) week<-c(1,2,3,1,2,1,2,3,4,5,...
Sointu's user avatar
  • 291
1 vote
1 answer
42 views

I have a tibble_data.frame of dimensions 1042x64. Columns are amphibian families and rows are the names of all species in that family. The first 5 rows and 2 columns look like this: > amphilist[1:5,...
feprocha93's user avatar
0 votes
1 answer
397 views

I am trying to get my head around an MS excel formula for a scenario I have. I am recording discharges over various sample periods. These sample periods will last anywhere from 24 hrs to 1 week. The ...
ETP's user avatar
  • 33
2 votes
1 answer
696 views

I am trying to obtain a matrix representation of an empirical 2 dimensional CDF given two data samples of the same size. I have two sorted data samples of the same size: sorted_sample1 and ...
Alexandre Bloch's user avatar

1
2 3 4 5
34