0

I have a data frame with a string variable that contains two sets of numbers. I need to multiply one number by the next number and assign the result to another field. This would be straightforward to do with regex, but the problem is that some of the observations contain multiple inputs that will need to be calculated, and I am struggling to conceptualize how to iterate over these strings until there are no additional pairs to multiply. And then once all pairs have been multiplied, they must be added and assigned to the new variable.

Here is my raw data

V1 <- c("ABC01-3XYZ=2, ABC04-5XYZ=3, ABC06-7XYZ=1",
         "ABC04-5XYZ=2", "ABC01-3XYZ=1, ABC04-5XYZ=1")
df <- data.frame(V1)

                                    V1
1 ABC01-3XYZ=2, ABC04-5XYZ=3, ABC06-7XYZ=1
2                             ABC04-5XYZ=2
3               ABC01-3XYZ=1, ABC04-5XYZ=1

I would like to multiple the integer immediately following the "-" by the integer immediately following the "=", and then sum them, so that the final result looks like this:

                                        V1 V2
1 ABC01-3XYZ=2, ABC04-5XYZ=3, ABC06-7XYZ=1 28
2                             ABC04-5XYZ=2 10
3               ABC01-3XYZ=1, ABC04-5XYZ=1  8

Any suggestions about how to iterate past each comma would be greatly appreciated. Thanks!

3
  • split them by , Commented Jun 2, 2017 at 19:12
  • Can you expand on that? I have actually tried splitting them into multiple columns, but that doesn't help apply the multiplication and addition that I need, because there is not a defined number of columns to apply a function over. Commented Jun 2, 2017 at 19:15
  • I didn't mean to columns. split them and store as a list and also store how many records in each row and blah blah blah. Anyhow, Lamia's answer is much smarter than what I proposed. Commented Jun 2, 2017 at 19:22

1 Answer 1

1

You could use str_match_all from the stringr package, and then access the parts you want using sapply:

library(stringr)
l = lapply(df$V1,function(x) str_match_all(x,c("-(\\d+)","=(\\d+)")))
df$V2 = sapply(l,function(x) sum(as.numeric(x[[2]][,2])*as.numeric(x[[1]][,2])))

This returns:

[1] 28 10  8
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.