regexp looping and logic in javascript

Question

Not certain if this can be done in regexp under javascript, but thought it would be interesting to see if it is possible. So thought I would clean up a piece of html to remove most tags, literally just dropping them, so <H1><img><a href ....>. And that would be relatively simple (well, stole the basis from another post, thanks karim79 Remove HTML Tags in Javascript with Regex).

function(inString, maxlength, callback){
        console.log("Sting is " + inString)
        console.log("Its " + inString.length)

        var regex = /(<([^>]+)>)/ig
        var outString =  inString.replace(regex, "");
        console.log("No HTML sting " + outString);
        if ( outString.length < maxlength){
            callback(outString)
        } else {
            console.log("Lets cut first bit")
        }
    }

But then I started thinking, is there a way where I can control regex execution. So lets say that I want to keep certain tabs, like b,br,i and maybe change H1-6 to b. So in pseudo code, something like:

for ( var i in inString.regex.hits ) {
   if ( hits[i] == H1 ) {
         hits[i] = b;
   }
}

The issue is that I want the text thats not HTML tags to stay as it is, and I want it to just cut out by default. One option would of course be to change the ones I want to keep. Say change <b> to [[b]], once that is done to all the ones of interest. Then put them back to <b> once all unknown have been removed. So like this (only for b, and not certain the code below would work):

 function(inString, maxlength, callback){
        console.log("Sting is " + inString)
        console.log("Its " + inString.length)

        var regex-remHTML = /(<([^>]+)>)/ig
        var regex-hideB = /(<b>)/ig
        var regex-showB = /([b])/ig
        var outString =  inString.replace(regex-hideB, "[b]");
        outString = outString.replace(regex-remHTML, "");
        outString = outString.replace(regex-showB, "<b>");
        console.log("No HTML sting " + outString);
        if ( outString.length < maxlength){
            callback(outString)
        } else {
            console.log("Lets cut first bit")
        }
    }

But would it be possible to be smarter, writing cod ethat says here is a peice of HTML tag, run this code against the match.

For any manipulation of HTML other than very simple cases, you might want to consider using a parser, rather than regex. — Tim Biegeleisen
– Tim Biegeleisen, Commented Jul 29, 2016 at 9:38
I was thinking about that at first, but are there any "configurable" ones. In this case, the security aspect is only half of it. The reason is that the HTML that goes in is from an article, and the code is expected to take the first "n" number of characters and make it pretty as a intoduction to the article. — vrghost
– vrghost, Commented Jul 29, 2016 at 10:30

Hitmands · Accepted Answer · 2016-07-29 09:48:54Z

2

As Tim Biegeleisen sai in its comment, maybe a better solution could be using a parser instead of a Regex...

By the way, if you want to control what is going to be changed by the regex you can pass a callback to the String.prototype.replace:

var input = "<div><h1>CIAO Bello</h1></div>";

var output = input.replace(/(<([^>]+)>)/gi, (val) => {
    
    if(val.indexOf("div") > -1) {
      return "";
    }
    
    return val;
  })
;

console.log("output", output);

answered Jul 29, 2016 at 9:48

Hitmands

14.2k4 gold badges41 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

vrghost Over a year ago

Looks good. Maybe a stupid question, what language is that (the if statement for val.indexOf does not look like javascript to me, but that might be because I am just not hardcore enough.

Hitmands Over a year ago

Yes, that's javascript: developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/…

vrghost Over a year ago

Makes sense now, old ksh scriptie my self, so thought it might be some strange regexp code. But it is clear to me now that it is just that you write code differently to me (and probabbly better, after all, you answered my question). I would have written War and Piece in else if :)

Collectives™ on Stack Overflow

regexp looping and logic in javascript

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related