2

I have a string with multiple substrings of the format {{******}} where ***** can be a couple things. I'm trying to split my string so that the resulting array contains the substrings before and after these substrings, as well as the full substrings themselves.

I've created a regular expression that works here: https://regex101.com/r/I65QQD/1/

I want the resulting array when I call str.split(...) to contain the full matches as seen in the link above. Right now it is returning subgroups so my array looks really weird:

let body = "Hello, thanks for your interest in the Melrose Swivel Stool. Although it comes in 2 different wood finishes, there aren't any options for the upholstery fabric. {{youtube:hyYnAioXOqQ}}\n Some similar stools in different finishes are below for your review. I hope this is helpful to you!\n\n{{attachment:2572795}}\n\n{{attachment:2572796}}\n\n{{attachment:2572797}}\n\n{{attachment:2572798}}\n";

let bodyComponents = body.split(/{{attachment:([\d]+)}}|{{(YOUTUBE|VIMEO):([\d\w]+)}}/i);

console.log(bodyComponents);

Is there any way to have the resulting array contain the full matches instead of the subgroups? So that it looks like this:

[
"Hello, thanks for your interest in the Melrose Swivel Stool. Although it comes in 2 different wood finishes, there aren't any options for the upholstery fabric. ",
"{{youtube:hyYnAioXOqQ}}",
...
]

Thanks

2
  • You cant use split to retrieve things ad this task requires more than one step. Commented Sep 5, 2017 at 23:59
  • please elaborate Commented Sep 6, 2017 at 0:03

1 Answer 1

1

You need to remove unnecessary capturing parentheses and turn an alternation group you have into a non-capturing one:

/({{attachment:\d+}}|{{(?:YOUTUBE|VIMEO):\w+}})/

Note that [\d\w] = \w and [\d] = \d.

Note that the whole pattern is wrapped with a single capturing group. ({{attachment:\d+}} has no capturing group round \d+, (?:YOUTUBE|VIMEO) is now a non-capturing group (and thus its value won't appear as a separate item in the resulting array) and ([\d\w]+) is turned into \w+ (\d is redundant as \w matches digits, too).

let body = "Hello, thanks for your interest in the Melrose Swivel Stool. Although it comes in 2 different wood finishes, there aren't any options for the upholstery fabric. {{youtube:hyYnAioXOqQ}}\n Some similar stools in different finishes are below for your review. I hope this is helpful to you!\n\n{{attachment:2572795}}\n\n{{attachment:2572796}}\n\n{{attachment:2572797}}\n\n{{attachment:2572798}}\n";
let bodyComponents = body.split(/({{attachment:\d+}}|{{(?:YOUTUBE|VIMEO):\w+}})/i);
console.log(bodyComponents);

Sign up to request clarification or add additional context in comments.

8 Comments

Welp... that explanation of why the results still appear in the resulting array didn't help much in my understanding. Hopefully I'll wrap my head around it sometime!
@MarksCode Do you mean you do not understand why captured group values are stored in the resulting array after split? Because split() is written so. It keeps all captures in the resulting split() array. Thus, when you need to avoid that, replace all capturing groups ((...)) with non-capturing ones ((?:...))
Oh, I see. I thought split() always just threw out whatever matched. I guess it works a bit different than I thought.
Not always, and not in all languages. In Java, it is not working the same way, capturing groups do not get into the resulting array.
This regex is very unintuitive. Why would the YOUTUBE|VIMEO have ?: to start off, but the attachment doesn't? Is there a better way to structure this to make it more obvious what this is trying to do?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.