1,696 questions
4
votes
8
answers
261
views
How to extract a given number of ordered rows from a given number of randomly selected samples?
Starting with the example dat0 below, let's say I want to randomly extract 3 pairs of ìd`s.
Initial data (5 id pairs)
dat0 <-
structure(list(id = c("A", "A", "B", &...
0
votes
0
answers
157
views
DateTimeException: [CANNOT_PARSE_TIMESTAMP] in PySpark dataframe without timestamp data
I have a PySpark DataFrame with two numeric columns (integers, negative and positive) and when I try to select a random sample of it, it generates an error. This is the code I'm trying to run:
...
1
vote
1
answer
100
views
Random sample with multi-variate conditions
I have a dataset that I need to pull a random sample from. This is the data:
{'character': {0: 'mario',
1: 'luigi',
2: 'yoshi',
3: 'peach',
4: 'bowser',
5: 'boo',
6: 'toad',
7: 'lakitu',
...
1
vote
1
answer
223
views
How to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame
I am trying to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame. However, I noticed that the sample function is producing the same set of IDs for each date no ...
2
votes
3
answers
284
views
Stratified sampling using SQL given an absolute sample size
I have the following population:
a
b
b
c
c
c
c
I am looking for a SQL statement to generate a the stratified sample of arbitrary size. Let's say for this example, I would like a sample size of 4. I ...
2
votes
1
answer
254
views
How to sample Pandas DataFrame using a normal distribution by using random_state and numpy Generators
I am trying to write Pandas code that would allow me to sample DataFrame using a normal distribution. The most convinient way is to use random_state parameter of the sample method to draw random ...
2
votes
1
answer
66
views
ADC7768 CRC calculation
I am using adc7768 to receive ADC samples. According to datasheet to calculate CRC, we can do it every 4th or 16th sample. My question is for 4 samples last 3 samples CRC will be header of 4th sample. ...
2
votes
0
answers
68
views
How to supply sample weights to Tensorflow dataset in R
I am trying to set up a dataset in R to run a neural network using TensorFlow, but I can't seem to figure out the right code to allow sample weights to be specified.
The input array is image_data and ...
4
votes
5
answers
282
views
Efficiently draw random samples without replacement from an array in python
I need to draw random samples without replacement from a 1D NumPy array. However, performance is critical since this operation will be repeated many times.
Here’s the code I’m currently using:
import ...
0
votes
3
answers
94
views
Why does Reservoir Sampling page in wikipedia say that the size of the list is unknown, but the source code function knows the size?
reservoir sampling on wikipedia: https://en.wikipedia.org/wiki/Reservoir_sampling
Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of ...
2
votes
1
answer
249
views
Resampling By Group in Polars [duplicate]
I'm trying to build a Monte Carlo simulator for my data in Polars. I am attempting to group by a column, resample the groups and then, unpack the aggregation lists back in their original sequence. I'...
0
votes
0
answers
31
views
plots not generating all the samples and leave excess vertical space
I'm pretty new to machine learning. I was using fetch_olivetti_faces as my database for practice in my coding class. I ran the code, and it worked since I was following the teacher's instructions. ...
1
vote
1
answer
224
views
I can't find /usr/local/cuda-<x>.<y>/gds/samples after I install cuda tookit and driver
I want to use GPUDirect Storage. I follow the instructions in https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#mofed-req-install to install it. The install details are as ...
2
votes
1
answer
54
views
Generate weighted sample from weighted list
I have a list of historical frequencies of elements that have occurred together over time. These elements may have occurred (without repetition) in sequences of various order and length.
For example, ...
1
vote
1
answer
216
views
Python Polars Sample N-1 by Group ID with Replacement
I am working on a bootstrapping project and need to sample M=N-1 observations with replacement where N is the number of unique observations in a specific group (defined by group_id). I need to figure ...
2
votes
2
answers
131
views
Dataframe - Select the minimum set of rows to cover all possible values of each columns
I'm working on a use-case on which I need to retrieve a minimal sample of rows from a dataframe that contains at least one row for each unique value found in all columns.
A simplified example could be ...
0
votes
1
answer
79
views
What is the time complexity of sample?
Using the default arguments, what is the time complexity of sample? I.e. how does the running time of sample(1:N) grow with N?
Documentation for sample is here but does not specify time complexity.
3
votes
1
answer
120
views
random.sample(population, X) sometimes not contained in random.sample(population, Y) when Y>X
EDIT: Edited original post replacing my code with an MRE from user @no comment.
I noticed a seemingly non-intuitive behaviour using a seeded random.sample(population, k) call to sample from a list of ...
1
vote
1
answer
69
views
Mystery bug in sampling for loop in R
I am trying to understand what is causing this bug in my R code and I feel like R is gaslighting me.
The sample() function seems to change depending on how I assign it?
Anyways, here is the MRE:
#...
0
votes
1
answer
30
views
Is there a way of creating multiple stratified samples at once?
Let say I have this input dataset with the Ids: a, b, c
I need to order it by packages of +-100 rows each sample with the same distribution of Ids as the input entire population.
What would be the ...
0
votes
1
answer
58
views
Create sample weight in PySpark sampled dataframe
I have created a dataframe in PySpark as follows:
df = spark.range(10)
The dataframe looks like this:
df.show()
+---+
| id|
+---+
| 0|
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
| 7|
| 8|
| 9|
+---+
I ...
0
votes
2
answers
53
views
Groupwise replace NA's with sampled value from non NA's using dplyr
I have a dataframe with missing NA values in column X1 and a grouping variable group. I want to replace all NA values with a value sampled from the non-NA values of that group. This should be done for ...
2
votes
1
answer
95
views
Select a random a sample from PyAthena SQL
I am in SageMaker Studio and I have connected to a dataset via PyAthena:
from pyathena import connect
s3_query_results = 'my s3 Location'
region = 'eu-west-2'
workgroup='primary'
Then I have written ...
1
vote
1
answer
50
views
Is there a function for sample selection based on conditions?
I am looking for an algorithm and its implementation in R for sample selection. I have a data.frame with i objects, and each object has j unique features. In parallel, I have > 100 samples k that ...
2
votes
3
answers
125
views
How to convert incomplete dates using sampled values from complete dates?
The data:
I have a very large dataset. The following is a representative example in .csv.
date, x, y
"03082018", 304, 1
"071999", 305, 1
"04032018", 309, 2
"041997&...
1
vote
1
answer
85
views
Most efficient way to conditionally sample my large df
I have a large DF (~35 million rows) and I am trying to create a new df by randomly sampling two rows from each unique cluster ID (~1.8 million unique cluster IDs)-- one row must have a label 0 and ...
0
votes
2
answers
65
views
Is there a way to balance data in R without reordering a dataframe?
First, here is some toy data:
df <- data.frame(
"stim" = c("face", "object", "pareidolia", "face", "face", "object", "...
0
votes
1
answer
33
views
Problot list of lists with differents colors
I am working on some data which are supposed to follow a normal distribution.
I have different sets of samples I'm interested in studying. So the data is stored in a list of lists (each sublist ...
3
votes
3
answers
286
views
Random sampling from ordered data
In a simulation, we need ordered data which is a random sample (with or without replacement) of size m from a full data set of size n. Unfortunately, ordering the sampled data turns out to be a ...
0
votes
0
answers
57
views
Asset Collection Samples for TopBraid Composer Maestro Edition
I am writing a term paper for my university on TopBraid Composer Maestro Edition and I need the assets samples for the software and I am unable to find them online. The link on the bottom of the ...
-1
votes
1
answer
71
views
Selecting elements of vectors in a list
I have a list with 3 (or more) vectors of characters ordered by total of elements
The first vector has 18 elements
The second vector has 623 elements
The third vector has 1706 elements
I would like to ...
0
votes
1
answer
29
views
Does the print of readframes(n) of wave library show the audio samples in hexadecimal?
When I print the result of the first audio frame by doing track.readframes(1), I get b'\xfb\xff\xfb\xfe'. My track is stereo so we know that we have 2 channels where each channel has 1 sample. Does it ...
1
vote
1
answer
223
views
Take a sample of the MNIST dataset
I am working with the MNIST dataset and performing different classification methods on it, but my runtimes are ridiculous, so I am looking for a way to maybe use an a portion of the training part of ...
0
votes
1
answer
43
views
unexpected sample behavior
I am currently running a small simulation and am irritated by the results.
This is my code:
ground_truth <- c("coke", "zero", "light", "zero")
options <- ...
1
vote
2
answers
206
views
Group dataframe and sample n rows with equal probability between groups
I have a pandas dataframe like this:
ID Value
0 a 2
1 a 4
2 b 6
3 c 8
4 c 10
5 c 12
I would like to sample equally from the ID groups. I know I can ...
0
votes
1
answer
99
views
Drawing a random sample from a very large dataset
I have a csv dataset with 160MM rows that is not possible to import directly through Pandas (RAM memory is not enough).
How could I draw a random sample of 5% from the original dataset (in this case, ...
1
vote
0
answers
165
views
Sampling from a Normal distribution with sparse covariance matrix
To sample from a gaussian distribution with mean zero and covariance matrix S, we can do the following:
from scipy import sparse
from numpy import np
S = sparse.diags([np.full(100,1),0.1*np.ones(99),0....
1
vote
1
answer
53
views
How to randomly select the content of some cells into a data frame?
It seems simple but I cannot find the solution: I want to randomly select some elements into a data frame imported from a .xlsx file.Is there a function such as sample_n to do this?
My problem lies in ...
1
vote
0
answers
64
views
Starting Learning node.js - Their Simple Server Sample Gives Syntax Errors
Using the code sample found here, when I run node server.js in the Command Prompt I get syntax errors.
First, it complains about =>. As far as I can see that's perfectly valid syntax, but if I ...
-2
votes
1
answer
66
views
How to add samples in Asset Store?
I cannot add an instance to my asset in the Asset Store. I looked at the documentation but couldn't succeed. Where am I making a mistake?
MyPackage
├── package.json
└── Samples~
├── ...
0
votes
1
answer
227
views
spatSample() choosing NA cells even when na.rm = TRUE
I assume this is the cause of my error. I have a stack of 18 rasters which underwent the following preprocessing steps:
Cropping to a specific region using a shapefile
Trim to the extent of the ...
0
votes
0
answers
56
views
Generating Samples from Customized Distribution - Stuck with Range Limitation
I'm struggling to generate samples from a custom distribution defined by its CDF but facing a persistent limitation. The issue boils down to:
Limited Range of Samples: Despite having a theoretical ...
0
votes
0
answers
40
views
Retrieve a fixed number of random records from a Postgres database table
I have a table with partitioning by day. This table stores data for one month and there are about 3 billion of them, but many partitions in the table are empty.
How can I optimally select exactly 5000 ...
5
votes
0
answers
148
views
Video recorded using MediaRecorder Web API comes with audio sped up
I've been using the MediaRecorder Web API to record a MediaStream obtained using getUserMedia() to record video + audio coming from a webcam + microphone.
getUserMedia is not called with specific ...
0
votes
0
answers
180
views
Drawing new data from KDE scikit-learn
I am fitting a Kernel Density Estimation instance on multi-variate data using scikit-learn implementation. As parameters I am using a 'gaussian' kernel and as bandwidth estimator 'silverman'
.fit() ...
0
votes
1
answer
103
views
sample with row specific probability in a data.table
I'm trying to sample a dummy based on probabilities that are part of my data.table. If my data.table only has two rows, this works:
library(data.table)
playdata <- data.table(id = c("a",&...
2
votes
1
answer
109
views
Sample rows from a dataframe by id when some ids have more rows than others
this is very basic but I couldn't find an answer online. I use R and have a dataset like this (but much larger):
set.seed(123)
id<-c(1,1,1,2,2,3,3,3,3,3,4,5,5,6,6,6)
week<-c(1,2,3,1,2,1,2,3,4,5,...
1
vote
1
answer
42
views
Randomly sort specific number of elements from columns of different lengths
I have a tibble_data.frame of dimensions 1042x64. Columns are amphibian families and rows are the names of all species in that family. The first 5 rows and 2 columns look like this:
> amphilist[1:5,...
0
votes
1
answer
397
views
Excel: Proportional monthly distribution of sample values over a time period spanning different months
I am trying to get my head around an MS excel formula for a scenario I have.
I am recording discharges over various sample periods. These sample periods will last anywhere from 24 hrs to 1 week. The ...
2
votes
1
answer
696
views
How to Calculate a 2D Empirical CDF via histogram2d
I am trying to obtain a matrix representation of an empirical 2 dimensional CDF given two data samples of the same size.
I have two sorted data samples of the same size: sorted_sample1 and ...