Refactor: unpack and re-order HTMLParser.mainLoop foreign-content conditionals #522

jayaddison · 2020-12-31T00:45:32Z

This changeset should functionally be a no-op, and may be best reviewed with whitespace changes ignored since the indentation level for some of the loop's logic has been reduced.

The changes appear to result in a minor performance improvement; that said there are likely larger benefits to be found by further simplification and refactoring (perhaps including fairly substantial logical restructuring).

Before (2c19b98)

.........................................
html_parse_etree: Mean +- std dev: 204 ms +- 10 ms

After (369a412)

.........................................
html_parse_etree: Mean +- std dev: 197 ms +- 9 ms

…dentation level of token handling logic

jayaddison · 2020-12-31T00:47:39Z

html5lib/html5parser.py

-                         type in (StartTagToken, CharactersToken, SpaceCharactersToken))):
+                    break
+
+                prev_token = new_token


NB: This is technically a behaviour change since prev_token could previously have referenced a ParseErrorToken. That said, I don't believe that the code that inspects the prev_token would ever be relevant for error tokens.

…content phase This is based on the idea that it's likely easier to understand the code -- and that it's hopefully less fragile -- if there is a single boolean with a readable name rather than repeated assignments to a variable that is invoked as a method call later

jayaddison · 2022-12-24T01:03:27Z

Cleaning up some old / stale pull requests; please let me know if this changeset is considered worthwhile and I'll reopen if so.

jayaddison added 5 commits December 30, 2020 23:58

Use 'break' statement to early-terminate parser loop; also reduces in…

befaf7f

…dentation level of token handling logic

Unpack gnarly conditional check within parser loop

d87ceb9

Re-order foreign content conditional checks

e12523d

Simplify current node presence test

f7d256f

Remove usage of rarely-referenced current node variables

369a412

jayaddison commented Dec 31, 2020

View reviewed changes

jayaddison added 2 commits January 9, 2021 12:42

Merge branch 'master' into refactor/parser-token-loop

47289c9

jayaddison closed this Dec 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor: unpack and re-order HTMLParser.mainLoop foreign-content conditionals #522

Refactor: unpack and re-order HTMLParser.mainLoop foreign-content conditionals #522

Uh oh!

jayaddison commented Dec 31, 2020

Uh oh!

jayaddison Dec 31, 2020

Uh oh!

jayaddison commented Dec 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Refactor: unpack and re-order HTMLParser.mainLoop foreign-content conditionals #522

Refactor: unpack and re-order HTMLParser.mainLoop foreign-content conditionals #522

Uh oh!

Conversation

jayaddison commented Dec 31, 2020

Uh oh!

jayaddison Dec 31, 2020

Choose a reason for hiding this comment

Uh oh!

jayaddison commented Dec 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant