2

I would like to write a simple internpreter in JavaScript/Node. I got an obstacle when it comes to generate tokens.

var code = 'if (a > 2 && b<4) c = 10;';

code.match(/\W+/g)
// [" (", " > ", " && ", "<", ") ", ";"]

code.match(/\w+/g)
// ["if", "a", "2", "b", "4", "elo"]

As shown, W+ lets me get special characters and w+ lets me get words. I wonder how to get those in one array, something like below:

// ["if", "(", "a", ">", "2", "&&", "b", "<", "4", ")", "c", "=", "10", ";"]
4
  • 1
    \W+ is extremely naive, consider (!a+-1), /\w+|\W+/ will generate (! , a , +-, 1, ) while the correct tokenization is (, !, a, +, -1, ) Commented Mar 25, 2016 at 11:37
  • You cannot parse JS with regexp. It does not have the necessary parsing power. Commented Mar 25, 2016 at 11:58
  • @torazaburo could you suggest something better? Commented Mar 25, 2016 at 12:08
  • Yes, Write an actual parser using parsing algorithms. Unless you are doing this as an exercise, you would be best advised to start with one of several existing JS parsers, such as esprima. Commented Mar 25, 2016 at 12:14

1 Answer 1

1

As shown, W+ lets me get special characters and w+ lets me get words. I wonder how to get those in one array, something like below:

Simply try this

code.match(/\w+|\W+/g)

gives output as

["if", " (", "a", " > ", "2", " && ", "b", "<", "4", ") ", "c", " = ", "10", ";"]

And this will trim the tokens as well

var tokens = code.match(/\w+|\W+/g).map(function(value){return value.trim()});
Sign up to request clarification or add additional context in comments.

4 Comments

Yet, if code is like if (!a), I am getting (! instead of standalone ( and !.
@DamianCzapiewski can you share the string you tried with?
var code = 'if (!a)'; but then && is what I'd expect so it seems to be no problem and that's work for another ops.
@DamianCzapiewski checked this input. I don't think from here on (till the stage I have given code for) you can rely on seperating tokens based on regex. You need to now iterate the tokens and seperate them based on JavaScript language grammer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.