1

Please consider the following data frame:

df <- structure(list(oID = c(37751L, 30978L, 33498L),
                     peId = c(12L, 13L, 14L),
                     last_Name = c("ABC", "DEF", "EFG"), 
                     first_Name = c("Z", "Y", "X"),
                     personnel_Number = list(structure(list(hId = c(1L, 4L, 5L),
                                                            hName = c("PS", "XY", "MN"),
                                                            personnel_Number = c("0123", "1234", "98")),
                                                       class = "data.frame",
                                                       row.names = c(NA, 3L)),
                                             structure(list(hId = 1L, hName = "PS",
                                                            personnel_Number = "0987"), 
                                                       class = "data.frame", 
                                                       row.names = 1L),
                                             structure(list(), 
                                                       names = character(0),
                                                       row.names = integer(0),
                                                       class = "data.frame")), 
                     ls_Role = list(structure(list(functionId = c(1L, 5L),
                                                   `function` = c("function A", "function B"),
                                                   function_Short = c("FA", "FB")),
                                              class = "data.frame",
                                              row.names = 1:2), 
                                    structure(list(functionId = 6L,
                                                   `function` = "function A",
                                                   function_Short = "FA"),
                                              class = "data.frame",
                                              row.names = 1L), 
                                    structure(list(functionId = 6L,
                                                   `function` = "function A", 
                                                   function_Short = "FA"),
                                              class = "data.frame",
                                              row.names = 1L))),
                row.names = c(1L, 2L, 3L),
                class = "data.frame")

As you can see, the data frame contains two list columns that each contain row-based data frames. I want to unnest each of these list columns in a way that their "inner" data frames will be put into wide format, i.e. all of the columns are repeated for as many rows as the inner data frames contain per row.

So for the "personnel_number" column I expect to return 9 columns (hId_1, _2, _3; hName_1, _2 and so on). And the same for the ls_Role column.

I know how I can do it manually by unnesting and then heavily reshaping the whole data frame, but I'm wondering if there is a more concise way that does it more automatically or with less function calls, preferably in a tidyverse manner.

Here's my current code (which also reflects my expected output):

library(tidyverse)
df |>
  mutate(id = row_number()) |> 
  unnest_longer(col = personnel_Number, keep_empty = TRUE) |> 
  unpack(cols = personnel_Number) |> 
  mutate(id_inner = row_number(), .by = id) |> 
  pivot_wider(values_from = c(hId, hName, personnel_Number),
              names_from = id_inner) |> 
  unnest_longer(col = ls_Role, keep_empty = TRUE) |> 
  unpack(cols = ls_Role) |> 
  mutate(id_inner = row_number(), .by = id) |> 
  pivot_wider(values_from = c(functionId, "function", function_Short),
              names_from = id_inner)

2 Answers 2

1

For your example data, you can get to the desired result by unnesting wider all lists twice, e.g.

library(tidyr)
library(dplyr)

res <- df |> 
  mutate(id = row_number(), .after = first_Name) |> 
  unnest_wider(where(is.list)) |> 
  unnest_wider(where(is.list), names_sep = "_")
 

Where res is:

# A tibble: 3 × 20
    oID  peId last_Name first_Name    id hId_1 hId_2 hId_3 hName_1 hName_2 hName_3
  <int> <int> <chr>     <chr>      <int> <int> <int> <int> <chr>   <chr>   <chr>  
1 37751    12 ABC       Z              1     1     4     5 PS      XY      MN     
2 30978    13 DEF       Y              2     1    NA    NA PS      NA      NA     
3 33498    14 EFG       X              3    NA    NA    NA NA      NA      NA     
# ℹ 9 more variables: personnel_Number_1 <chr>, personnel_Number_2 <chr>,
#   personnel_Number_3 <chr>, functionId_1 <int>, functionId_2 <int>,
#   function_1 <chr>, function_2 <chr>, function_Short_1 <chr>,
#   function_Short_2 <chr>

Check it matches desired result:

identical(wanted, res)
[1] TRUE

wanted data:

wanted <- df |>
  mutate(id = row_number()) |> 
  unnest_longer(col = personnel_Number, keep_empty = TRUE) |> 
  unpack(cols = personnel_Number) |> 
  mutate(id_inner = row_number(), .by = id) |> 
  pivot_wider(values_from = c(hId, hName, personnel_Number),
              names_from = id_inner) |> 
  unnest_longer(col = ls_Role, keep_empty = TRUE) |> 
  unpack(cols = ls_Role) |> 
  mutate(id_inner = row_number(), .by = id) |> 
  pivot_wider(values_from = c(functionId, "function", function_Short),
              names_from = id_inner)
Sign up to request clarification or add additional context in comments.

1 Comment

Oh my, could be so easy. The only thing in my real-life use case is that the two unnest_widers are considerably slower (factor 10-20) than the unnest_longers.
0

One possible way to solve your problem. This solution is based on functions from data.table package

library(data.table)

fun = function(dfs) {
  lst = vector("list", length(dfs))
  for(i in seq_along(dfs)) {
    d = setDT(copy(dfs[[i]]))
    lst[[i]] = if(nrow(d)) dcast(d, . ~ seq_len(nrow(d)), value.var=names(d)) else d = data.table(. = ".")
  }
  rbindlist(lst, fill=T)[, . := NULL][]
}


df2 = do.call(cbind, list(df, fun(df$personnel_Number), fun(df$ls_Role)))

    oID peId last_Name ... hId_1 hId_2 hId_3 hName_1 hName_2 hName_3 personnel_Number_1 personnel_Number_2 personnel_Number_3 functionId_1 functionId_2 function_1 function_2 function_Short_1 function_Short_2
1 37751   12       ABC ...     1     4     5      PS      XY      MN               0123               1234                 98            1            5 function A function B               FA               FB
2 30978   13       DEF ...     1    NA    NA      PS    <NA>    <NA>               0987               <NA>               <NA>            6           NA function A       <NA>               FA             <NA>
3 33498   14       EFG ...    NA    NA    NA    <NA>    <NA>    <NA>               <NA>               <NA>               <NA>            6           NA function A       <NA>               FA             <NA>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.