10,285 questions
0
votes
1
answer
25
views
Ordering Y-axis labels in ggplot2 when using a leveled factor doesn't work
I'm using the following to generate a column plot of my data. But despite converting the variable Recommendations into a leveled factor, the Y-axis labels still are NOT ordered the way they are listed ...
1
vote
2
answers
153
views
How can I efficiently read a large CSV file from Azure Blob Storage into R for analysis?
I have the following function to read a CSV file from Azure:
read_csv_from_azure <- function(file_path, container) {
# Try to download the file and handle potential errors
tryCatch({
# ...
0
votes
2
answers
63
views
Transforming Categorical Column into Binary Columns in R Based on Multiple Conditions
I have a dataframe with two columns in R. One of the columns (column1) has three possible values (A, A and B, B).
The rows are patients.
I want to transpose column1, so I'd have binary columns (Yes, ...
0
votes
1
answer
47
views
Apply a function in a data frame group-wise on a subset of rows? [closed]
By using tidyverse, I want to calculate standard deviation of alt_freq column grouping by rsid in a data frame. In each group, I want to consider only those rows which have at least 100 samples. I ...
4
votes
2
answers
3k
views
Why does this create an "NAs introduced by coercion" warning
Curious why the following produces an "NAs introduced by coercion" warning
# Example dataframe
df <- tibble(
session = c("a",2),
)
df %>%
mutate(sessionNum = case_when(
...
0
votes
1
answer
35
views
merge two dataframes of different sizes without key [duplicate]
I have two dataframes with different columns and different row sizes
library(tidyverse)
tb1 <- tibble(id= 1:10,
a= 1:10,
b=11:20)
tb2 <- tibble(id= 1:5,
...
4
votes
2
answers
156
views
R: Efficient way to str_replace_all without recursively replacing conflicting substitutions?
Hello,
The problem
First, let me try to illustrate the problem. Assume I want to apply the following cipher to encode the string, "abc".
library(tidyverse)
cipher <- tibble(
byte = c(...
0
votes
0
answers
50
views
strip.white doesn't work in r if the data frame is too large
I import, clean, and merge two different data sets that are created from Qualtrics surveys. I use read.csv to load the data, and I have strip.white=TRUE to remove leading and trailing spaces. If I ...
0
votes
3
answers
135
views
Using 'slice_max()' in for loop
I'm trying to create new dataframes with the top three values for each column across a dataframe.
probUnweighted <- data.frame(
Sample1 = c(0.9, 0.2, 0.03, 0.1, 0.5, 0.09),
Sample2 = c(0.045, 0.11,...
1
vote
2
answers
66
views
Remove string from column across group of rows in another column
I would like to remove a string from one column across a group of rows in another column. In the below reprex, I would like to remove the string in snippet from the string in text in any row in the ...
1
vote
2
answers
319
views
How to select specific columns across multiple dataframes in R and then bind them into one data.frame?
I am trying to select or subset multiple data frames with different number of columns. They all contain the same columns of interest, so I am trying to make them all contain the same columns so I can ...
1
vote
2
answers
50
views
Recode relationship matrices based on new subgrouping
Problem:
I have a survey dataset which includes intra-household relationships. I had to subdivide household into tax-unit, which means I need to redefine the relationship matrices based on the new tax-...
0
votes
1
answer
52
views
additional arguments to purrr:map don't work as expected
I'm using the purrr::map function to iterate over several columns and tidy the result. for a short example, I provide the following code:
library(tidymodels)
library(broom)
> penguins %>%
+ ...
3
votes
2
answers
71
views
Trying to create a grouped barchart in R - producing a stacked one instead
I am trying to create a bar chart that has the number of each species grouped into years. I want each year represented on the x axis with the number of each of the 3 species grouped next to one ...
0
votes
1
answer
51
views
Propensity density score with MatchIt package -- how to bind rows when we have lot of datasets to have a final dataset with matched characteristics
I'm expanding this post -- answered by @edwards (Thanks).
I'm working with panel data. We assessed children in 2019, 2020, 2021 and 2022. Therefore, I have four datasets (2019, 2020, 2021, and 2022). ...
0
votes
1
answer
50
views
Name nested column list with specific name
This is my code
library(tidyverse)
# Criar um dataframe de exemplo com dados de futebol
dat <- tibble(
continent = rep(c("Asia", "Europe", "Africa", "Americas&...
0
votes
1
answer
73
views
Using for loops, while, tidyverse, or packages to create a dataset with matching characteristics from a previous one (sampling) [closed]
I'm working with panel data. We assessed children in 2019 and 2020. Therefore, I have two datasets (2019 and 2020) and I want to create a third dataset matching the data from the second dataset (2020) ...
5
votes
5
answers
150
views
Convert a list into a tibble with nested columns
I would like to convert a list like this into a tibble.
lst <- list(
"A"=list(
"Category"="A",
"Team"=c("x"),
"City"="...
0
votes
2
answers
74
views
Loop through variables to filter a tibble in R
This feels like it should be easier than it is but here we go. I have a data frame that looks like this:
to.csv = structure(list(geography = c("030223131022122122", "030223131220201023&...
1
vote
3
answers
81
views
Apply command for complex functions and calculations on a dataset in R
I'm a reasonably experienced R user who has often struggled to use the apply family. I have very slow-moving iterative code whose performance I'm hoping to improve through the use of this family, but ...
2
votes
2
answers
104
views
How to create subgroups based on group relationship criteria
Context:
I have a dataframe of individual people grouped by household, which includes relationship parameters for each individual describing their relationship to every other individual in the ...
0
votes
1
answer
100
views
How to refer to dataset in ggplot using dplyr
I have the following dataset in R
crude_data <- structure(list(date = structure(c(19570, 19601, 19631, 19662,
19692, 19723, 19754, 19783, 19814, 19844, 19875, 19905, 19936,
19967, 19997, 20028, ...
3
votes
4
answers
117
views
Joining lat/lon data frames by nearest distance
Let's say I have a regular latitude/longitude grid and data at irregular locations, like this:
grid = tidyr::crossing(lon = seq(0, 1, 0.25), lat = seq(0, 1, 0.25))
data = tibble::tibble(lon = runif(4),...
0
votes
1
answer
52
views
Sample all rows of N groups
I'm trying to find a way to sample N whole groups from a dataframe.
For example, if we had the below dataframe:
group value
1 a 1
2 a 2
3 a 3
4 b 4
5 b ...
1
vote
2
answers
72
views
getting the latest date for a a duplicated item in r [closed]
I have the following dataset (below).
I am trying to get the latest SEnd value for each individual tag (see Desired output) where I have the Tag, Owner and the latest SEnd date only. Essentially I am ...
0
votes
1
answer
63
views
Function works with plan(sequential) but not plan(multisession)
Here is my code :
plan(multisession,workers=detectCores()-2)
future_map_dfr(.x= Liste_model[1:2],.f = summaryModel, df = DF_MODEL_TRAIN, df_test = DF_MODEL_TEST, df_global = DF_MODEL_GLOBAL, .id = &...
1
vote
2
answers
90
views
parallel/automatic way of unnesting list columns that contains data frames (list columns might be empty)
Please consider the following data frame:
df <- structure(list(oID = c(37751L, 30978L, 33498L),
peId = c(12L, 13L, 14L),
last_Name = c("ABC", &...
1
vote
1
answer
127
views
Little hack for ggplot -- an easy way to add a text with the real means and standard deviation when using lines or bars
I just want to add some text of real means and sd to my plots when I'm working with one outcome or multiple outcomes. see the pictures below for reference. Code is below. If any updated package ...
0
votes
0
answers
88
views
What is the preferred / recommended rlang metaprogramming syntax to use on both sides of an assignment operator in the `dplyr::mutate()` function?
I have a question about an issue that's similar to this older question about the dplyr::filter() function, except that my example is a bit more complicated because dplyr::mutate() needs to process ...
1
vote
1
answer
30
views
Long to wide format based on variable suffixes in tidyverse in R
I wonder if there is a way for my DATA to be reformatted to my Desired_output below?
Specifically, for each unique study, we stick together a pair of pre and postNUMBER together, separately for T and ...
1
vote
2
answers
72
views
Use ifelse for several columns in R
My goal is to create a binary variable (k) that turns 1 if in any column between mpg and wt the values 3.90 and/or 160.0 appears.
Code
library(tidyverse)
mtcars<-mtcars%>%
mutate(k=ifelse(mpg:...
0
votes
1
answer
34
views
Transforming value into row number
I am conducting some survival analysis and an attempting to turn my wide table into long format for analysis using dplyR. I want to turn the value of 'dead flies' into rows with a binary status for ...
0
votes
0
answers
72
views
How to color the legend labels in ggplot [duplicate]
Using the iris dataset, we can make a boxplot and customise the legend when plotting using ggplot like so:
ggplot(data = iris, aes(x=Species, y=Sepal.Length, fill=Species))+
geom_boxplot()+
...
0
votes
1
answer
71
views
Is there a way to pass a string as a variable/column name to my function and use in a call to mutate?
I have a dataframe with a column indicating choices (of a survey) as well as a column indicating the index of the choice made in each row. e.g.,
df <- tibble(
record_id = 1:9,
choices = c(rep(&...
4
votes
1
answer
81
views
Is there a way to prevent facet labels from being equal width (after rotation)
When I apply a facet_grid, sometimes the labels are quite wide, sometimes so wide that they don't fit, and I have to rotate them. This isn't a problem unless I want to facet by multiple different ...
0
votes
1
answer
96
views
How to iterate a function for multiple values (Loop function)?
By running the following function, the output would be:
library(pmsampsize)
pmsampsize(type = "s", csrsquared = 0.5, parameters = 10, rate = 0.065,
timepoint = 2, meanfup = 2.07)
NB: ...
0
votes
1
answer
72
views
Change raster extent with tidy
I have files from ERA5 that have extent from 0 to 360 (lon) and -90 to 90 (lat)
Example:
> era5_sr
class : SpatRaster
dimensions : 721, 1440, 744 (nrow, ncol, nlyr)
resolution : 0.25, 0....
1
vote
2
answers
45
views
In R, how to find the proportion of cases which have a value present in another column?
This seemed really simple to me at first, but is unexpectedly giving me trouble. Let's say my dataset looked like this:
mock <- tribble(~case_id, ~characteristic,
1, "A&...
1
vote
2
answers
71
views
Divide groups in other groups by date intervals
I'm dealing with dates and I wanted to group some rows together but I can't find how.
In my data, one row is an individual in a time interval and in a place. Something like that :
ind place ...
3
votes
4
answers
81
views
How to obtain all the numeric variables in data frame and use in another function in R
the below example is to obtain the min value among 3 columns, we can use the pmin(V1, V2, V3).
if we have lots of columns, how to get the minimal value among all numeric variables especially using the ...
1
vote
1
answer
72
views
Conditional filtering of dataframe in R
I wonder how to dplyr::filter() my DATA to catch the rows for IDs whose Language value when 'Type!=5F' and when 'Type==5F' changes from other languages to "English"?
For example, ID==1 has ...
2
votes
2
answers
41
views
Wide format data by pasting two sets of variables into one in R
I've tried to wide-format my DATA into my Desired_output using:
pivot_wider(DATA, names_from = Year, values_from = c(Type, Language))
without success. Is there a way to achieve my Desired_output?
...
-2
votes
1
answer
260
views
How to utilize Rauh's German Political Sentiment Dictionary
I need to utlize the named sentiment dictionary for my sentiment analysis in R studio. Unfortunately I have problems at that. The dictionary comes within a zip archive and specifically (as I assume) ...
5
votes
3
answers
103
views
How to extract birth and death year from string in R?
I have the first paragraph of Wikipedia articles from the wikifacts package (only for people). I like to extract birth year and year of death.
library(wikifacts)
library(tidyverse)
politicians <- ...
1
vote
1
answer
1k
views
"Error in initializePtr(): function 'cholmod_factor_ldetA' not provided by package Matrix" gets displayed while trying to generate mixed effects model
I have already tried the previous solutions displayed by some users. I have tried removing and reinstalling Matrix and lme4 packages. To make matters worse, now R is unable to install lme4 or Matrix ...
1
vote
3
answers
192
views
Classification of rows/individuals based on their column output in an incidence matrix
I wrote an R function to classify rows (individuals) based on the columns output in an incidence matrix M5 for the following requirements:
M5 <- structure(c(1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1,...
1
vote
1
answer
59
views
pivot_longer from multiple columns into a singular names_to and two values_to
I've spent most of the day on this and finally calling in some help. There are multiple entries here on related questions, but none that quite get at what I'm trying to do.
Below is an example df.
x &...
1
vote
1
answer
108
views
Is there an elegant way to handle changing number of rows within tidyverse?
In Tidyverse there are limitations concerning the row number resulting from some data processing. Most prominent, mutate expects that the row number equals to the original data set. For example, if we ...
-2
votes
2
answers
195
views
Is there a %$% operator?
In the book R for Data Science, there is an operator %$%, as in the example code below. But when I run that code, I get the error message "there is no such operator". Can anyone help with ...
0
votes
1
answer
107
views
Order columns based of suffix condition in R
The name of my variables looks like this:
df <- data.frame(var_NA = 1:10, var = 11:20, var_Level = 21:30, var_Total = 31:40)
Except I have lots of variables. The key feature is that for every &...