-
-
Notifications
You must be signed in to change notification settings - Fork 32.7k
Description
The HTML specs support three kinds of bogus comments:
<?...>
.HTMLParser
callshandle_pi()
for it.<!...>
.HTMLParser
used to callunknown_decl()
for it, but now (after gh-77057: Fix handling of invalid markup declarations in HTMLParser #9295) it callshandle_comment()
.</...>
if no ASCII letter follows/
.HTMLParser
callshandle_comment()
for it.
And, of course, handle_comment()
is called for normal comments <!--...-->
and <!--...--!>
. This includes abnormal cases <!-->
and <!--->
which are treated as empty comment <!---->
.
It is now impossible to differentiate <![if !(IE)]>
from </[if !(IE)]>
and <!--[if !(IE)]-->
. This may be important, even if they are the same comment from the point of view of the HTML specs.
It was proposed in #70197 to add a new handler handle_bogus_comment()
which calls bogus_comment()
by default to differentiate bogus comments from normal comments. Additional information should be passed to it besides the comment value to differentiate different kinds of bogus comments. For example, the character preceding the comment value (?
, !
or /
). But since handle_pi()
is already called for <?...>
and unknown_decl()
used to be called for <!...>
, we can just restore the use of unknown_decl()
and add a new handler for </...>
.
The second way will partially revert #9295. The difference is that a bogus comment (unknown declaration) starting with <![
will be terminated by first >
instead of ]>
or ]]>
, in accordance to the HTML specs.
The problem is that unknown_decl()
is also called for valid CDATA section (and trailing ]]
is omitted). According to the HTML specs, its content should be treated as normal text, so we could simply call handle_data()
(as for resolved character references), but for flexibility we can call a special method.
Metadata
Metadata
Assignees
Labels
Projects
Status