3

I was trying to generate a regex to be used in Java using this link.

I can have the following kind of strings.

1. customer calls <function_name> using <verb> on <uri> with <object>
2. customer calls <function_name> using 'POST' on <uri> with <object>
3. customer calls 'create' using 'POST' on <uri> with <object>
4. customer calls 'create' using 'POST' on <uri>

As you can see, the last portion after with is optional in my case.

I implemented the following regular expression.

.+call[s]?.+(\'\w+\'|<\w+>).+using.+(\'\w+\'|<\w+>).+on.+(\'\w+\'|<\w+>).*(with.+(\'\w+\'|<\w+>))?

But when I give string 3, I am getting the output as 'create','POST',<object>, null, null instead of 'create','POST',<uri>, <object>. When I give string 4, the output is 'create','POST',<uri>, null, null instead of 'create','POST',<uri>.

The regex without (with.+(\'\w+\'|<\w+>))? works properly for string 4. How can I change this last part where I need to make the section from with optional?

2 Answers 2

1

Your regex accepts too much and backtracks a lot due to your overuse of the greedy .+. Remember that every time you write .+ or .*, the regex engine matches everything up to the end of the line and then needs to backtrack. This is both expensive and error prone - it eats up too much text nearly every time, and you should be very careful when using this construct. It doesn't act like most people expect it to.

The simple solution in your case is to actually state precisely what you're expecting, and from your example text it looks like you need whitespace, so just use \s+ instead. Your regex becomes:

.+?\bcalls?\s+(\'\w+\'|<\w+>)\s+using\s+(\'\w+\'|<\w+>)\s+on\s+(\'\w+\'|<\w+>)(?:\s+with\s+(\'\w+\'|<\w+>))?

Demo

Note that I also changed the first .+ to a lazy .+? (even though you could probably just remove it from the pattern unless you also need the full line to be captured) followed by a word boundary anchor \b. I also changed a group to be noncapturing, since you most probably don't need to capture that.

Sign up to request clarification or add additional context in comments.

Comments

1

Use [ ]+ in place of .+ for space

Try this:

.+call(?:s)?.+(\'\w+\'|<\w+>)[ ]*using.+(\'\w+\'|<\w+>)[ ]*on[ ]*(\'\w+\'|<\w+>)[ ]*(?:with)?[ ]*(\'\w+\'|<\w+>)?

You will get

 1. <function_name> <verb> <uri> <object>    
 2. 'create' 'POST' <uri> <object>    
 3. <function_name> 'POST' <uri> <object>    
 4. 'create' 'POST' <uri> null

in 4th row last one is null because end token (i.e. <object>) is missing

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.