Sort values across different columns

Question

I have the following data frame I need to sort in a specific way. D1 and C1 are linked together D2 and C2 are linked together, D3 and C3 are linked together. I need columns D1, D2 and D3 to contain only one unique value, with there C# in column C.. respectfully.

X = structure(list(D1 = c("A", "A", "A", "A", "A", "B", "B", "B", 
"B", "B", "C", "C", "C", "C", "C", "NA", "NA", "NA", "NA", "NA", 
"NA", "NA", "NA", "NA", "NA", "B", "B", "B", "B", "B"), C1 = c("1", 
"2", "3", "4", "5", "1", "2", "3", "4", "5", "1", "2", "3", "4", 
"5", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 
"1", "2", "3", "4", "5"), D2 = c("NA", "NA", "NA", "NA", "NA", 
"A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "C", "C", "C", 
"C", "C", "A", "A", "A", "A", "A", "NA", "NA", "NA", "NA", "NA"
), C2 = c("NA", "NA", "NA", "NA", "NA", "1", "2", "3", "4", "5", 
"1", "2", "3", "4", "5", "1", "2", "3", "4", "5", "1", "2", "3", 
"4", "5", "NA", "NA", "NA", "NA", "NA"), D3 = c("B", "B", "B", 
"B", "B", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 
"NA", "A", "A", "A", "A", "A", "NA", "NA", "NA", "NA", "NA", 
"NA", "NA", "NA", "NA", "NA"), C3 = c("1", "2", "3", "4", "5", 
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "1", 
"2", "3", "4", "5", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 
"NA", "NA", "NA")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-30L))

My expected output format is this:

Y = structure(list(D1 = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "NA", "NA", "NA", "NA", "NA", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "NA", "NA", "NA", "NA", "NA"), C1 = c("1", 
"2", "3", "4", "5", "1", "2", "3", "4", "5", "NA", "NA", "NA", 
"NA", "NA", "1", "2", "3", "4", "5", "1", "2", "3", "4", "5", 
"NA", "NA", "NA", "NA", "NA"), D2 = c("B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "NA", "NA", 
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "B", "B", "B", 
"B", "B"), C2 = c("1", "2", "3", "4", "5", "1", "2", "3", "4", 
"5", "1", "2", "3", "4", "5", "NA", "NA", "NA", "NA", "NA", "NA", 
"NA", "NA", "NA", "NA", "1", "2", "3", "4", "5"), D3 = c("NA", 
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "C", "C", 
"C", "C", "C", "C", "C", "C", "C", "C", "NA", "NA", "NA", "NA", 
"NA", "NA", "NA", "NA", "NA", "NA"), C3 = c("NA", "NA", "NA", 
"NA", "NA", "NA", "NA", "NA", "NA", "NA", "1", "2", "3", "4", 
"5", "1", "2", "3", "4", "5", "NA", "NA", "NA", "NA", "NA", "NA", 
"NA", "NA", "NA", "NA")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -30L))

X |> sort_by(~D1+C1+D2+C2), where X is your input data. Re-arranging of column order is only cosmetic. Does your real data contain of several more columns and you want to sort after the pattern D1, C1, D2, C2, D3, ...? — Friede
– Friede, Commented Apr 9 at 6:31
@Friede yes indeed, my real data contains several more columns. I need the data in this format because the Packages I need to use only support it this way.. — szmple
– szmple, Commented Apr 9 at 6:33
Ok. It's more complicated then. I do not get the pivoting logic. You might need a stepwise/recursive solution including rowSums(). Not too sure what you want. And, haven't run, but I do not think your pivoting approach works. — Friede
– Friede, Commented Apr 9 at 6:54
@Friede Thanks for your thoughts, indeed, it is a very special case, I have never encountered a frame manipulation like this. The pivoting logic is not based on much, it would be my first guess on how to complete the task. — szmple
– szmple, Commented Apr 9 at 6:58
@Friede Could very well be. I will remove it from the question and update it once I realize more suggestions. — szmple
– szmple, Commented Apr 9 at 6:59

Edward · Accepted Answer · 2025-04-09 07:17:14Z

2

You can try pivoting, and then changing the names used to pivot back to wide form based on the values of "D", which seems to get your desired output. But first, we need to create an "id" variable.

library(dplyr)
library(tidyr)

df %>%
  mutate(id=row_number()) %>%
  pivot_longer(-id,
               names_to=c(".value","name"),
               names_sep=1) %>% 
  mutate(name=case_when(D=="A"~1,
                      D=="B"~2,
                      D=="C"~3,
                      .default=NA)) %>% 
  filter(!is.na(name)) %>%
  pivot_wider(id_cols=id, values_from=c("D","C"), 
              names_sep="", names_vary = "slowest")

Gives:

# A tibble: 30 × 7
      id D1    C1    D2    C2    D3    C3   
   <int> <chr> <chr> <chr> <chr> <chr> <chr>
 1     1 A     1     B     1     NA    NA   
 2     2 A     2     B     2     NA    NA   
 3     3 A     3     B     3     NA    NA   
 4     4 A     4     B     4     NA    NA   
 5     5 A     5     B     5     NA    NA   
 6     6 A     1     B     1     NA    NA   
 7     7 A     2     B     2     NA    NA   
 8     8 A     3     B     3     NA    NA   
 9     9 A     4     B     4     NA    NA   
10    10 A     5     B     5     NA    NA   
# ℹ 20 more rows
# ℹ Use `print(n = ...)` to see more rows

edited Apr 9 at 7:17

answered Apr 9 at 7:08

Edward

22.3k3 gold badges18 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

szmple Apr 9 at 7:19

Hi Edward, Thanks a lot for your suggestion. I agree that this seems most logical. Of course, I will try to vectorise this, as the values of D are not always A, B or C. But I am sure I will manage to do that. Thanks again!

Collectives™ on Stack Overflow

Sort values across different columns

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related