1

I wonder how to dplyr::filter() my DATA to catch the rows for IDs whose Language value when 'Type!=5F' and when 'Type==5F' changes from other languages to "English"?

For example, ID==1 has Spanish when 'Type!=5F' and English when 'Type==5F'. So, we filter it.

Desired_output is below.

DATA <- read.table(header=TRUE,text="
ID Type Year Language
1  1A   1718 spansih
1  1A   1819 spanish
1  5F   1920 English
2  1B   1718 spanish
2  1B   1819 spanish
2  1B   1920 spanish
2  5F   2021 spanish
3  1B   1920 English
3  5F   2021 English")

Desired_output <- read.table(header=TRUE,text="
ID Type Year Language
1  1A   1718 spansih
1  1A   1819 spanish
1  5F   1920 English")

1 Answer 1

1

By ID, you can check for rows where Type == "5F" & Language == "English" and the row above is not using lag() keeping the whole group when any value meets the conditions, e.g.

library(dplyr)

DATA |> 
  filter(any(Type == "5F" & Language == "English" & !lag(Type == "5F" | Language == "English")),
         .by = ID)

  ID Type Year Language
1  1   1A 1718  spansih
2  1   1A 1819  spanish
3  1   5F 1920  English
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! There is a problem. I updated my DATA to show you the problem. Your solution catches cases where Language has remained English throughout. Please run with my updated DATA to see that it doesn't produce the Desired_output.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.