-1

I need your help because I have a big data set about illnesses (wide format). So I have 54 different illnesses, each having a block of 18 questions (data is nested in illnesses and participants).

As I have the same variables/questions for each illness, I am trying to find a fast way to calculate rowMeans for the scales (maybe using a loop function).

So basically I have the variables epi_scm.1 - epi_scm.18, ms_scm.1-ms_scm.18, autism_scm.1-autism_scm.18 and so on (beginning of column name indicating the illness and end indicating multi items) and I need to calculate the rowMeans out of the multi items for each illness (e.g., Morality = rowMeans([, c("epi_scm.1","epi_scm.2", etc.)] but I do not wanna do that manually for every illness (as there are many).

Do you know how to do this more efficiently ? (I hope you understood what I mean)

Thanks and best regards!

L

I tried to subset each illness but that takes too much time and isnt really suitable for my main analyses:

#Subset data to include only 42 columns

subset_epi <- df4[, 1:42]  # Replace 1:42 with the indices or column names of the columns I want to keep
subset_epi <- subset_epi[complete.cases(subset_epi), ]

#Organize index numbers subset

rownames(subset_epi) <- 1:nrow(subset_epi)
dim(subset_epi)

#New variable Morality = Mean Score for Morality

subset_epi$Morality <- rowMeans(na.omit(subset_epi[, c("epi_scm_1", "epi_scm_2", "epi_scm_3", "epi_scm_4", "epi_scm_5")]))
3
  • 2
    Welcome to stack overflow. I recommend you pivot your data to long format to start with, so instead of having so many columns you just have "participant", "illness" ,"question_no" and "response" for example. This allows you to do grouping operations by illness and simplifies everything downstream. Commented Apr 26, 2024 at 9:07
  • 1
    For usecases like this, I suggest reshaping your data to melt/pivot your wide data to a long format, and perform operations on the long dataset. Commented Apr 26, 2024 at 9:08
  • Do the names always follow the pattern in the question, characters then an underscore then more characters then a dot and a number? Commented Apr 26, 2024 at 9:14

2 Answers 2

0

Here's an example approach you could try.

library(tidyverse)

# Mocking up some data:
df <- data.frame(participant = 1:4,
                 epi_scm_1 = sample(1:5, 4, TRUE),
                 epi_scm_2 = sample(1:5, 4, TRUE),
                 epi_scm_3 = sample(1:5, 4, TRUE),
                 autism_scm_1 = sample(1:5, 4, TRUE),
                 autism_scm_2 = sample(1:5, 4, TRUE),
                 autism_scm_3 = sample(1:5, 4, TRUE)
                 )

# pivoting longer:
df2 <- df |> pivot_longer(-participant,
                   names_to = c("illness", NA, "question_no"),
                   names_sep = "_",
                   values_to = "score")

df2 looks like:

# A tibble: 24 × 4
   participant illness question_no score
         <int> <chr>   <chr>       <int>
 1           1 epi     1               5
 2           1 epi     2               1
 3           1 epi     3               5
 4           1 autism  1               2
 5           1 autism  2               1
 6           1 autism  3               3
 7           2 epi     1               1
 8           2 epi     2               1
 9           2 epi     3               3
10           2 autism  1               4
etc

we then want to do the summarising by taking the mean of the score per-participant and per-illness:

> df2 |> summarise(mean = mean(score), .by = c(participant, illness))
# A tibble: 8 × 3
  participant illness  mean
        <int> <chr>   <dbl>
1           1 epi      3.67
2           1 autism   2   
3           2 epi      1.67
4           2 autism   2.33
5           3 epi      4   
6           3 autism   1   
7           4 epi      1.67
8           4 autism   2.33

Sign up to request clarification or add additional context in comments.

Comments

0

Get the names' prefixes with sub, discarding all from the dot onward, then split the names vector by this prefixes vector. Finally, sapply rowMeans to the columns given by the split vector.

nms_subset_epi <- names(subset_epi)
f <- sub("\\..*$", "", nms_subset_epi)
nms_split <- split(x, f)

sapply(nms_split, \(nms) {
  subset_epi[nms] |> rowMeans(na.rm = TRUE)
})
#>        autism_scm    epi_scm      ms_scm
#>  [1,] -0.71350045 -0.3449778  1.13997663
#>  [2,] -0.38520104  0.4688165  0.31570295
#>  [3,] -0.07565961 -0.1079713  0.26870261
#>  [4,]  0.47264391  0.2930216  0.49563327
#>  [5,] -0.93888010  1.1580985 -0.05093926
#>  [6,] -0.65158609 -0.1481108 -0.57702851
#>  [7,]  0.61227658 -0.1976458  0.33733187
#>  [8,] -0.23906127  0.1635898  0.09771817
#>  [9,]  1.00040381 -1.4154913 -2.05572232
#> [10,]  0.95393066 -1.5664113  0.15947003

Created on 2024-04-26 with reprex v2.1.0

You can then give new names to this matrix, for instance,

colnames(result) <- paste0(colnames(result), "_Means") 

Test data

x <- "epi_scm.1, epi_scm.18, ms_scm.1, ms_scm.18, autism_scm.1, autism_scm.18"
x <- scan(text = x, what = character(), sep = ",") |> trimws()
nms_subset_epi <- x

set.seed(2024)
subset_epi <- replicate(6, rnorm(10))
subset_epi[sample(60, 10)] <- NA
subset_epi <- subset_epi |>
  as.data.frame() |>
  setNames(x)

Created on 2024-04-26 with reprex v2.1.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.