4

In the dplyr package, recode() has been superseded in favor of case_match(). Is there a way to use labels stored in, for example, char array to recode values using case_match()?

For example, with recode() I can store labels in a char array (or read them from a CSV file) and use them for recoding:

lbls <- c(
    'male' = 'Man',
    'female' = 'Woman'
)

starwars %>%
    select( sex ) %>%
    mutate(
        sex = recode( sex, !!!lbls )
    )

# A tibble: 87 × 1
#   sex  
#   <chr>
# 1 Man  
# 2 none 
# 3 none 
# 4 Man  
# 5 Woman
# ...

However, since case_match() requires two-sided formulas (old_values ~ new_value), that does not work. Is there a way to use stored values also in case_match()?

2
  • 1
    superseded does not necessarily mean that it won't work in the near or distant future. Commented Mar 28, 2024 at 8:06
  • Ah, ok, so I can expect recode() to work in the foreseeable future? Sounds good. Commented Mar 28, 2024 at 8:16

2 Answers 2

3

You can transform the named vector into a list of formulas in advance.

rules <- Map(reformulate, shQuote(lbls), shQuote(names(lbls)))

# $`'Man'`
# "male" ~ "Man"
# 
# $`'Woman'`
# "female" ~ "Woman"

starwars %>%
  select( sex ) %>%
  mutate(
    sex = case_match(sex, !!!rules, .default = sex)
  )

# # A tibble: 87 × 1
#    sex  
#    <chr>
#  1 Man  
#  2 none 
#  3 none 
#  4 Man  
#  5 Woman
#  6 Man  
#  7 Woman
#  8 none 
#  9 Man  
# 10 Man  
# ℹ 77 more rows
Sign up to request clarification or add additional context in comments.

Comments

2

You can create a set of rules to be evaluated.

tidyverse approach

As you're using dplyr let's go all in:

(rules <- glue::glue('"{lbl}" ~ "{val}"', lbl = names(lbls), val = lbls))
# "male" ~ "Man"
# "female" ~ "Woman"

You can then turn this character vector into a list of call objects with rlang::parse_exprs(). Then inject the list into the function call as arguments using the splice operator, !!!:

starwars |>
    select(sex) |>
    mutate(
        sex = case_match(
            sex,
            !!!rlang::parse_exprs(rules),
            .default = sex
        )
    )
# # A tibble: 87 × 1
#    sex  
#    <chr>
#  1 Man  
#  2 none 
#  3 none 
#  4 Man  
#  5 Woman
#  6 Man  
#  7 Woman
#  8 none 
#  9 Man  
# 10 Man  
# # ℹ 77 more rows
# # ℹ Use `print(n = ...)` to see more rows

base R approach

We can also do the parsing and splicing in base R. For me it's a little clearer what's going on. We can define rules with sprintf() instead of glue, as suggested by Darren Tsai.

rules <- c(
    "sex",
    sprintf('"%s" ~ "%s"', names(lbls), lbls)
)

To get the character vector into a list of language objects, instead of parse_exprs() we can use str2lang(). Then !!! is a way of applying case_match() to a list of arguments, i.e. the equivalent of do.call().

starwars |>
    select(sex) |>
    mutate(
        sex = do.call(
            case_match,
            c(
                lapply(rules, str2lang),
                list(.default = sex)
            )
        )
    )
# # A tibble: 87 × 1
#    sex
#    <chr>
#  1 Man
#  2 none
#  3 none
#  4 Man
#  5 Woman
#  <etc>

A note on .default

Note that unlike recode, we need to provide case_match() with the .default parameter:

The value used when values in .x aren't matched by any of the LHS inputs. If NULL, the default, a missing value will be used.

If this is not provided, any value not specified (e.g. "none") becomes NA

3 Comments

Good idea +1. You can also use a base way sprintf('"%s" ~ "%s"', names(lbls), lbls) in place of glue.
@DarrenTsai Thanks I've updated the answer with a base R approach to injecting this (obviously with the exception of dplyr which was part of the question). I think reformulate()` is a nice approach too.
From the similar answers, I accepted this as a more thorough answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.