0

Hi im working in a javascript application and need help figuring out this regex

I have a series of strings. they are dynamic but do have a set pattern.

name eq 'abc'
id in 'def'
key | operator | value

then i have a modifier 'has'

has name eq 'abc'
!has id
has address eq '123 sesame street'
|modifier | key | operator | value

I am able to extract the modifier and key no problem with this regex

new RegExp(/(^(\s*!?has\s+)?([^\s]+)|(^\s*[^\s]+))/i)

but the issue comes in when I have a key that is the same as a modifier

has eq '123'

the regexp above returns 'has eq' where i only need 'has'

has has eq '123'

the above returns properly 'has has'

there is a large number of operators to handle but they are a set value

any help would be appreciated

5
  • 1
    You sure a regex is a great pattern for this? Why not a simple parser? Commented Jan 24, 2017 at 22:00
  • @DaveNewton my application is much more complex than seen here. I have not found something that would fit my needs. Please make a suggestion! There is a lot I have not seen out there. Commented Jan 24, 2017 at 22:03
  • Could you just pick out the bit before the eq with \w+(?=\s+eq\b)? Of course, this will suffer similarly if you have eq as an operator or modifier name. Commented Jan 24, 2017 at 22:35
  • @MatthewStrawbridge unfortunately I have multiple operators and sometimes no operator with the !has modifier...the followed by syntax wont really work this way Commented Jan 24, 2017 at 22:38
  • Parsers handler (essentially) arbitrarily complex expressions. There are tons of parser generators or you can roll your own if it's simple enough. It's often a better approach than trying to handle things with regex-that's all I'm saying. Commented Jan 25, 2017 at 2:13

2 Answers 2

4

You need to be specific an fully specify all the valid syntax:

var keyval = ''

keyval += "^\s*(\w+)\s+eq\s+'(.*)'$";      // for key eq 'val'
keyval += "|^\s*has\s(\w+)\s+eq\s+'(.*)'$"; // for has key eq 'val'

new RegExp(keyval, 'i');

I'm not sure if you need the has var and !has var lines without values if you do you can add:

keyval += "|^\s*!?has\s+(\w+)$";  // for has key and !has key

Note that the main problem with your regexp is failing to recognise that eq is an important keyword.


Additional notes:

Personally I would not use one regexp for this. Doing so makes the regexp long and complicated and can also make extracting matches difficult. You can use the trick above to break up a long regexp but in my opinion it is better to use many smaller regexp. I'd write something like the following:

var key_equal_pattern     = /^\s*(\w+)\s+eq\s+'(.*)'$/i;
var has_key_equal_pattern = /^\s*has\s(\w+)\s+eq\s+'(.*)'$/i;
var has_patten            = /^\s*!?has\s+(\w+)$/i;

if ((m = input.match(key_equal_pattern)) !== null) {
    // handle match
}
else if  ((m = input.match(has_key_equal_pattern)) !== null) {
    // handle match
}
else if  ((m = input.match(has_patten)) !== null) {
    // handle match
}

This is much more maintainable compared to a giant regexp. Note that while the common saying is that you can't parse things like html with regexp what people really want to say is that you can't do it in a single regexp. Almost all html parsers use regexp in the tokenisation process then use if and for loops to process the structure of the data.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes this really makes me think! It is a good approach. I will apply this idea tonight and update with my findings
1

Your input data seems to follow just a few possible patterns:

mod key
    key op val
mod key op val

If this is representative of all your data, and you trust your input data to be well-formed, a simple shortcut is to extract all tokens, and to distinguish the key op val pattern from the others by the number of tokens extracted.

The following demo illustrates the approach, correctly identifying your problem test cases:

function extract(str){
  var result = str.match(/'[^']*'|\S+/g);
  if(result.length == 3){// key op val
    return {
      key: result[0],
      op:  result[1],
      val: result[2]
    }
  } else {// mod key OR mod key op val
    return {
      mod: result[0],
      key: result[1],
      op:  result[2],
      val: result[3]
    }
  }
}

console.log(extract("!has id"));
console.log(extract("has eq '123'"));
console.log(extract("has has eq '123'"));

2 Comments

Would you be able to mod this assuming that value is a free text string and could contain spaces?
If value is in quotes, as your examples are, the demo code already supports free text (any chars except a straight quote). Otherwise, you would need a different approach than this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.