0

When I trying to match for above regex using scala lib (working with re2), code goes into below path and times out 1 minute:

Regex:

(([a-z0-9!#$%&'*+?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])))

Stack Trace:

at java.util.regex.Pattern$CharProperty.match(Pattern.java:3693)
at java.util.regex.Pattern$Curly.match(Pattern.java:4125)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3694)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3694)

I am not sure if its infinite loop as It might work after long time duration. I need to help to understand what exactly in this expression is causing this to happen and how to improve this expression.

4
  • add more details about input, what you are trying to achieve with this regex plus seems like it's not infinite , are you trying to validate an email or something ? Commented Oct 13, 2016 at 17:52
  • What happens if you increase the stack size? Commented Oct 13, 2016 at 17:59
  • 1
    The dot is not escaped - (?:., ?.)+, it must be escaped to match a literal dot. It may cause issues in your case. Commented Oct 13, 2016 at 18:05
  • 1
    Is that intended to validate an e-mail address? E-mail addresses are a lot more complicated than just [email protected]; it’s not advisable to verify them with a regular expression. Commented Oct 13, 2016 at 19:45

2 Answers 2

2

Your regular expression has nested quantifiers (e.g. (a+)*). This works well with re2 but not with most other regular expression engines.

Sign up to request clarification or add additional context in comments.

Comments

1

Unescaped dot inside the regular expression outside a character class matches any char but a linebreak symbol. That means that in your pattern, there are two unescaped dots that can match the same pattern as the adjoining subpatterns: (?:. and ?.)+.

If you load your pattern at regex101.com and test it against ggggg@gggggggggggggggggggg, you will see (with PCRE setting) that the engine needs thousands of steps to finish matching.

It happens because the unescaped dots are located inside the quantified groups.

It is also a reason why ggggg@cccc is also matched with your pattern.

Since you most probably mean to match literal dots, escape them:

(([a-z0-9!#$%&'*+?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])))

See the regex demo

Note that you might want to remove 2 capturing groups around the whole pattern since you do not seem to need them.

1 Comment

FYI, (?:[a-z0-9-]*[a-z0-9]) at the end can be written as [a-z0-9-]*[a-z0-9]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.