0

I have large quantities of datasets in Excel that I would like to analyze in R. The files have a format that organizes all information per block of the same year, which looks like:

Group    <- c(2010, 'Group', 'A', 'B', 'C', 2011, 'Group', 'A', 'B', 'E', 2012, 'Group', 'A', 'B')
Value <- c(NA,'Value', 1, 2, 9, NA, 'Value', 3, 5, 2, NA, 'Value', 9, 1)

df <- cbind(Group, Value)
        
  Group   Value  
 1: 2010  NA     
 2: Group Value
 3: A     1    
 4: B     2    
 5: C     9    
 6: 2011  NA     
 7: Group Value
 8: A     3    
 9: B     5    
10: E     2    
11: 2012  NA     
12: Group Value
13: A     9  
14: B     1  

To be able to analyze the data, I would like to automatically add a column for the year so that all data can be combined, as follows:

   Year Group Value  
 1: 2010  A  1       
 2: 2010  B  2
 3: 2010  C  9        
 4: 2011  A  3     
 5: 2011  B  5        
 6: 2011  E  2
 7: 2012  A  9 
 8: 2012  B  1     

2 Answers 2

1
library(data.table)
dt <- data.table(df)
dt[, Year := Group[1], cumsum(is.na(Value))][Value != 'Value']

   Group Value Year
1:     A     1 2010
2:     B     2 2010
3:     C     9 2010
4:     A     3 2011
5:     B     5 2011
6:     E     2 2011
7:     A     9 2012
8:     B     1 2012

in Base R:

subset(transform(df, Year = ave(Group, cumsum(is.na(Value)), FUN=\(x)x[1])), Value != 'Value')
   Group Value Year
3      A     1 2010
4      B     2 2010
5      C     9 2010
8      A     3 2011
9      B     5 2011
10     E     2 2011
13     A     9 2012
14     B     1 2012

Note that the above columns are character. You can use type.convert(new_df, as.is = TRUE) where new_df is the resultant df to convert the columns to respective classes

Sign up to request clarification or add additional context in comments.

Comments

0

Here is one method with tidyverse - create the 'Year' column where the 'Group' values have 4 digits numbers, then filter out the 'Group' rows where value is 'Group', fill the 'Year' column with the previous non-NA values, filter out the first row with duplicated and convert the type (type.convert)

library(dplyr)
library(stringr)
library(tidyr)
 df %>%
   mutate(Year = case_when(str_detect(Group, "^\\d{4}$") ~ Group)) %>% 
   filter(Group != 'Group') %>% 
   fill(Year) %>%
   filter(duplicated(Year)) %>%
   type.convert(as.is = TRUE) %>%
   select(Year, Group, Value)

-output

   Year Group Value
1 2010     A     1
2 2010     B     2
3 2010     C     9
4 2011     A     3
5 2011     B     5
6 2011     E     2
7 2012     A     9
8 2012     B     1

data

df <- data.frame(Group, Value)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.