-1

I'm trying to convert a string of HTML to an array of HTML. For example, I might have a string of arbitrary HTML that looks like this:

"<div>This</div><h1>Is</h1> <p>A</p> <a href="#">Test</a>"

(There may or may not be spaces between the tag elements)

I'm trying to convert it into an array that looks like this:

["<div>This</div>", "<h1>Is</h1>", "<p>A</p>", "<a href="#">Test</a>"]

This is just for displaying the tags as text - I'm not going to use them as HTML elements.

I have looked at this example here, and it almost works except it strips the tags from the inner text: https://stackoverflow.com/a/54340630/1282216

I'm looking for a solution that does not involve DOM parsing - if possible.

Any suggestions welcome!

1 Answer 1

12

Always use a DOMParser to parse HTML strings.

const string = `<div>This</div><h1>Is</h1> <p>A</p> <a href="#">Test</a>`;

const doc = new DOMParser().parseFromString(string, "text/html");
const HTMLArray = [...doc.body.children].map(el => el.outerHTML);

console.log(HTMLArray)

You should never use RegExp to parse XML/HTML strings. But if you really want, and know your string by hearth and looks as you provided it...

const str = `<div>This</div><h1>Is</h1> <p>A</p> <a href="#">Test</a>`;
const m = str.match(/<[^>]+>[^<]*<\/[^>]+>/g); // Use at your own risk

console.log(m); 

Notice that the above will not work for deeply nested HTML, and if inside an attribute there's a < or > character (which is completely valid and not unusual)

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much! I'm actually not parsing real HTML. It's just a string that looks like HTML and is being displayed for presentation purposes (on a WebGL texture).
@d13 if you don't use nested HTML or XML (the string is exactly as you provided it), you could try the above RegExp, but at your own risk.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.