0

I have the following data frame I need to sort in a specific way. D1 and C1 are linked together D2 and C2 are linked together, D3 and C3 are linked together. I need columns D1, D2 and D3 to contain only one unique value, with there C# in column C.. respectfully.

X = structure(list(D1 = c("A", "A", "A", "A", "A", "B", "B", "B", 
"B", "B", "C", "C", "C", "C", "C", "NA", "NA", "NA", "NA", "NA", 
"NA", "NA", "NA", "NA", "NA", "B", "B", "B", "B", "B"), C1 = c("1", 
"2", "3", "4", "5", "1", "2", "3", "4", "5", "1", "2", "3", "4", 
"5", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 
"1", "2", "3", "4", "5"), D2 = c("NA", "NA", "NA", "NA", "NA", 
"A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "C", "C", "C", 
"C", "C", "A", "A", "A", "A", "A", "NA", "NA", "NA", "NA", "NA"
), C2 = c("NA", "NA", "NA", "NA", "NA", "1", "2", "3", "4", "5", 
"1", "2", "3", "4", "5", "1", "2", "3", "4", "5", "1", "2", "3", 
"4", "5", "NA", "NA", "NA", "NA", "NA"), D3 = c("B", "B", "B", 
"B", "B", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 
"NA", "A", "A", "A", "A", "A", "NA", "NA", "NA", "NA", "NA", 
"NA", "NA", "NA", "NA", "NA"), C3 = c("1", "2", "3", "4", "5", 
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "1", 
"2", "3", "4", "5", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 
"NA", "NA", "NA")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-30L))

My expected output format is this:

Y = structure(list(D1 = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "NA", "NA", "NA", "NA", "NA", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "NA", "NA", "NA", "NA", "NA"), C1 = c("1", 
"2", "3", "4", "5", "1", "2", "3", "4", "5", "NA", "NA", "NA", 
"NA", "NA", "1", "2", "3", "4", "5", "1", "2", "3", "4", "5", 
"NA", "NA", "NA", "NA", "NA"), D2 = c("B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "NA", "NA", 
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "B", "B", "B", 
"B", "B"), C2 = c("1", "2", "3", "4", "5", "1", "2", "3", "4", 
"5", "1", "2", "3", "4", "5", "NA", "NA", "NA", "NA", "NA", "NA", 
"NA", "NA", "NA", "NA", "1", "2", "3", "4", "5"), D3 = c("NA", 
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "C", "C", 
"C", "C", "C", "C", "C", "C", "C", "C", "NA", "NA", "NA", "NA", 
"NA", "NA", "NA", "NA", "NA", "NA"), C3 = c("NA", "NA", "NA", 
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "1", "2", "3", "4", 
"5", "1", "2", "3", "4", "5", "NA", "NA", "NA", "NA", "NA", "NA", 
"NA", "NA", "NA", "NA")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -30L))
6
  • X |> sort_by(~D1+C1+D2+C2), where X is your input data. Re-arranging of column order is only cosmetic. Does your real data contain of several more columns and you want to sort after the pattern D1, C1, D2, C2, D3, ...? Commented Apr 9 at 6:31
  • @Friede yes indeed, my real data contains several more columns. I need the data in this format because the Packages I need to use only support it this way.. Commented Apr 9 at 6:33
  • Ok. It's more complicated then. I do not get the pivoting logic. You might need a stepwise/recursive solution including rowSums(). Not too sure what you want. And, haven't run, but I do not think your pivoting approach works. Commented Apr 9 at 6:54
  • @Friede Thanks for your thoughts, indeed, it is a very special case, I have never encountered a frame manipulation like this. The pivoting logic is not based on much, it would be my first guess on how to complete the task. Commented Apr 9 at 6:58
  • @Friede Could very well be. I will remove it from the question and update it once I realize more suggestions. Commented Apr 9 at 6:59

1 Answer 1

2

You can try pivoting, and then changing the names used to pivot back to wide form based on the values of "D", which seems to get your desired output. But first, we need to create an "id" variable.

library(dplyr)
library(tidyr)
df %>%
  mutate(id=row_number()) %>%
  pivot_longer(-id,
               names_to=c(".value","name"),
               names_sep=1) %>% 
  mutate(name=case_when(D=="A"~1,
                      D=="B"~2,
                      D=="C"~3,
                      .default=NA)) %>% 
  filter(!is.na(name)) %>%
  pivot_wider(id_cols=id, values_from=c("D","C"), 
              names_sep="", names_vary = "slowest")

Gives:

# A tibble: 30 × 7
      id D1    C1    D2    C2    D3    C3   
   <int> <chr> <chr> <chr> <chr> <chr> <chr>
 1     1 A     1     B     1     NA    NA   
 2     2 A     2     B     2     NA    NA   
 3     3 A     3     B     3     NA    NA   
 4     4 A     4     B     4     NA    NA   
 5     5 A     5     B     5     NA    NA   
 6     6 A     1     B     1     NA    NA   
 7     7 A     2     B     2     NA    NA   
 8     8 A     3     B     3     NA    NA   
 9     9 A     4     B     4     NA    NA   
10    10 A     5     B     5     NA    NA   
# ℹ 20 more rows
# ℹ Use `print(n = ...)` to see more rows
Sign up to request clarification or add additional context in comments.

1 Comment

Hi Edward, Thanks a lot for your suggestion. I agree that this seems most logical. Of course, I will try to vectorise this, as the values of D are not always A, B or C. But I am sure I will manage to do that. Thanks again!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.