0

I have following java pattern.

^[ -~&&[^"'<>\\]]*$

Basically this is everything from space to ~ character (from ascii table) excluding double quotes, single quotes, angular brackets and backslash.

I would like to convert it into Javascript pattern, I would appreciate any help.

4
  • Regexes are supposed to be universal, only their usage is different. How are you calling this pattern in each language? Commented Nov 4, 2014 at 21:54
  • @Builder_K I'm pretty sure that JavaScript doesn't support that conjunction notation for character sets. Commented Nov 4, 2014 at 21:56
  • So does this mean i need to break the pattern up into two tests for Javascript? Commented Nov 4, 2014 at 21:57
  • @user3017924 no I think it can be done, it just looks different. Commented Nov 4, 2014 at 22:00

1 Answer 1

2

The only way I can think of to do that is with negative lookahead:

var pattern = /^(?:(?!["'<>\\])[ -~])*$/;

The negative lookahead (?!["'<>\\]) will cause the match to fail if it matches one of the characters you don't want.

If you want to keep the same pattern for both languages, then this one should work in Java too.

edit — breaking it down:

  • The leading ^ and trailing $ mean that the overall pattern has to match the entire test string. (That's the same as the Java version.)
  • The outer (?: ) grouping is called a "non-capturing" group. An ordinary group made with plain parentheses would work too, but I am trying to get in the habit of using non-capturing groups when I don't need to do the capture part. Probably not an issue either way. However the point of it is that we need to group together the following two parts so that the * operator can apply (more below).
  • The (?! ) part is the negative lookahead. What that does is tell the matcher to see whether the pattern in the lookahead matches, but to do so without "advancing" through the pattern. It's like a "peek around the corner" tool. Because it's negative lookahead, if the pattern does match, then the lookahead fails. This prevents the pattern from matching the punctuation characters excluded in the Java version.
  • After the lookahead is the "all 7-bit ASCII characters" pattern from the Java version, minus the conjunctive subclause (which doesn't work in JavaScript).
  • The combination of the negative lookahead and the "any character" pattern are grouped with *, meaning that the matcher should try over and over again to match each character to the end of the test string.
Sign up to request clarification or add additional context in comments.

2 Comments

Awesome! thanks a bunch works like a charm. For a newbie like me on JS Regex could you explain what exactly it is doing? I guess I need to read up on negative look ahead.
Yep, this workaround is best one. Also it is mentioned in this article rexegg.com/regex-class-operations.html#subtraction_workaround

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.