Not certain if this can be done in regexp under javascript, but thought it would be interesting to see if it is possible.
So thought I would clean up a piece of html to remove most tags, literally just dropping them, so <H1><img><a href ....>. And that would be relatively simple (well, stole the basis from another post, thanks karim79 Remove HTML Tags in Javascript with Regex).
function(inString, maxlength, callback){
console.log("Sting is " + inString)
console.log("Its " + inString.length)
var regex = /(<([^>]+)>)/ig
var outString = inString.replace(regex, "");
console.log("No HTML sting " + outString);
if ( outString.length < maxlength){
callback(outString)
} else {
console.log("Lets cut first bit")
}
}
But then I started thinking, is there a way where I can control regex execution. So lets say that I want to keep certain tabs, like b,br,i and maybe change H1-6 to b. So in pseudo code, something like:
for ( var i in inString.regex.hits ) {
if ( hits[i] == H1 ) {
hits[i] = b;
}
}
The issue is that I want the text thats not HTML tags to stay as it is, and I want it to just cut out by default. One option would of course be to change the ones I want to keep. Say change <b> to [[b]], once that is done to all the ones of interest. Then put them back to <b> once all unknown have been removed. So like this (only for b, and not certain the code below would work):
function(inString, maxlength, callback){
console.log("Sting is " + inString)
console.log("Its " + inString.length)
var regex-remHTML = /(<([^>]+)>)/ig
var regex-hideB = /(<b>)/ig
var regex-showB = /([b])/ig
var outString = inString.replace(regex-hideB, "[b]");
outString = outString.replace(regex-remHTML, "");
outString = outString.replace(regex-showB, "<b>");
console.log("No HTML sting " + outString);
if ( outString.length < maxlength){
callback(outString)
} else {
console.log("Lets cut first bit")
}
}
But would it be possible to be smarter, writing cod ethat says here is a peice of HTML tag, run this code against the match.