0

In HTML, block elements can't be children of inline elements. Browsers however are happy to accept this HTML:

<i>foo <h4>bar</h4> fizz</i>

and render it intuitively as expected; neither do they choke on it using DOMparser.

But it's not valid and is therefore hard to convert to another schema. Pandoc parses the above as (option1):

<i>foo </i><h4>bar</h4> fizz

which is at least valid but not faithful. Another approach would be (option2):

<i>foo </i><h4><i>bar</i></h4><i> fizz</i>

Is there a way to force DOMparser to do a more strict parsing that would result in option 1 or 2? (It doesn't seem possible).

Alternatively, what would be the best approach to deal with this, that is, given the first string, get option 1 or 2 as a result? Is there a JS parser that does this (and other strict enforcing of the standard)?

Edit: it turns out the HTML parser of at least Chrome (78.0.3904.108) behaves differently when the content is in a p instead of, say, a div. When the HTML above is in a p then it gets parsed as option 2! But it's left as is when inside a div.

So I guess the question is now: how to enforce the behavior of ps onto divs?

3
  • 1
    Does this answer your question? Strict HTML parsing in JavaScript Commented Dec 11, 2019 at 0:14
  • Thanks but no; that other question is about trying to validate parsed HTML after the fact, and detect errors; I'm trying to find a parser that produces valid strict HTML on the first pass. Commented Dec 11, 2019 at 11:12
  • That distinction isn't relevant here. Both questions want to get strict HTML from non-strict HTML via DOMparser. The answer is the same: No, there isn't a way to get DOMparser to do that, you'll need code or a library to do it. Commented Dec 11, 2019 at 19:52

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.