Strict HTML parser in JavaScript

In HTML, block elements can't be children of inline elements. Browsers however are happy to accept this HTML:

<i>foo <h4>bar</h4> fizz</i>

and render it intuitively as expected; neither do they choke on it using DOMparser.

But it's not valid and is therefore hard to convert to another schema. Pandoc parses the above as (option1):

<i>foo </i><h4>bar</h4> fizz

which is at least valid but not faithful. Another approach would be (option2):

<i>foo </i><h4><i>bar</i></h4><i> fizz</i>

Is there a way to force DOMparser to do a more strict parsing that would result in option 1 or 2? (It doesn't seem possible).

Alternatively, what would be the best approach to deal with this, that is, given the first string, get option 1 or 2 as a result? Is there a JS parser that does this (and other strict enforcing of the standard)?

Edit: it turns out the HTML parser of at least Chrome (78.0.3904.108) behaves differently when the content is in a p instead of, say, a div. When the HTML above is in a p then it gets parsed as option 2! But it's left as is when inside a div.

So I guess the question is now: how to enforce the behavior of ps onto divs?

edited Dec 13, 2019 at 14:32

asked Dec 10, 2019 at 21:45

Ken

1041 silver badge9 bronze badges

1

Does this answer your question? Strict HTML parsing in JavaScript

Ouroborus
– Ouroborus

2019-12-11 00:14:01 +00:00
Commented Dec 11, 2019 at 0:14
Thanks but no; that other question is about trying to validate parsed HTML after the fact, and detect errors; I'm trying to find a parser that produces valid strict HTML on the first pass.

Ken
– Ken

2019-12-11 11:12:49 +00:00
Commented Dec 11, 2019 at 11:12
That distinction isn't relevant here. Both questions want to get strict HTML from non-strict HTML via DOMparser. The answer is the same: No, there isn't a way to get DOMparser to do that, you'll need code or a library to do it.

Ouroborus
– Ouroborus

2019-12-11 19:52:49 +00:00
Commented Dec 11, 2019 at 19:52

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Strict HTML parser in JavaScript

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked