wp_kses_split2()
Callback for wp_kses_split() fixing malformed HTML tags.
This function does a lot of work. It rejects some very malformed things like <:::>. It returns an empty string, if the element isn't allowed (look ma, no strip_tags()!). Otherwise it splits the tag into an element and an attribute list.
After the tag is split into an element and an attribute list, it is run through another filter which will remove illegal attributes and once that is completed, will be returned.
Internal function — this function is designed to be used by the kernel itself. It is not recommended to use this function in your code.
No Hooks.
Returns
String. Fixed HTML element
Usage
wp_kses_split2( $content, $allowed_html, $allowed_protocols );
- $content(string) (required)
- Content to filter.
- $allowed_html(array[]|string) (required)
- An array of allowed HTML elements and attributes, or a context name such as
'post'. See wp_kses_allowed_html() for the list of accepted context names. - $allowed_protocols(string[]) (required)
- Array of allowed URL protocols.
Changelog
| Since 1.0.0 | Introduced. |
| Since 6.6.0 | Recognize additional forms of invalid HTML which convert into comments. |
wp_kses_split2() wp kses split2 code WP 7.0
function wp_kses_split2( $content, $allowed_html, $allowed_protocols ) {
$content = wp_kses_stripslashes( $content );
/*
* The regex pattern used to split HTML into chunks attempts
* to split on HTML token boundaries. This function should
* thus receive chunks that _either_ start with meaningful
* syntax tokens, like a tag `<div>` or a comment `<!-- ... -->`.
*
* If the first character of the `$content` chunk _isn't_ one
* of these syntax elements, which always starts with `<`, then
* the match had to be for the final alternation of `>`. In such
* case, it's probably standing on its own and could be encoded
* with a character reference to remove ambiguity.
*
* In other words, if this chunk isn't from a match of a syntax
* token, it's just a plaintext greater-than (`>`) sign.
*/
if ( ! str_starts_with( $content, '<' ) ) {
return '>';
}
/*
* When certain invalid syntax constructs appear, the HTML parser
* shifts into what's called the "bogus comment state." This is a
* plaintext state that consumes everything until the nearest `>`
* and then transforms the entire span into an HTML comment.
*
* Preserve these comments and do not treat them like tags.
*
* @see https://html.spec.whatwg.org/#bogus-comment-state
*/
if ( 1 === preg_match( '~^(?:</[^a-zA-Z][^>]*>|<![a-z][^>]*>)$~', $content ) ) {
/**
* Since the pattern matches `</…>` and also `<!…>`, this will
* preserve the type of the cleaned-up token in the output.
*/
$opener = $content[1];
$content = substr( $content, 2, -1 );
do {
$prev = $content;
$content = wp_kses( $content, $allowed_html, $allowed_protocols );
} while ( $prev !== $content );
// Recombine the modified inner content with the original token structure.
return "<{$opener}{$content}>";
}
/*
* Normative HTML comments should be handled separately as their
* parsing rules differ from those for tags and text nodes.
*/
if ( str_starts_with( $content, '<!--' ) ) {
$content = str_replace( array( '<!--', '-->' ), '', $content );
while ( ( $newstring = wp_kses( $content, $allowed_html, $allowed_protocols ) ) !== $content ) {
$content = $newstring;
}
if ( '' === $content ) {
return '';
}
// Prevent multiple dashes in comments.
$content = preg_replace( '/--+/', '-', $content );
// Prevent three dashes closing a comment.
$content = preg_replace( '/-$/', '', $content );
return "<!--{$content}-->";
}
// It's seriously malformed.
if ( ! preg_match( '%^<\s*(/\s*)?([a-zA-Z0-9-]+)([^>]*)>?$%', $content, $matches ) ) {
return '';
}
$slash = trim( $matches[1] );
$elem = $matches[2];
$attrlist = $matches[3];
if ( ! is_array( $allowed_html ) ) {
$allowed_html = wp_kses_allowed_html( $allowed_html );
}
// They are using a not allowed HTML element.
if ( ! isset( $allowed_html[ strtolower( $elem ) ] ) ) {
return '';
}
// No attributes are allowed for closing elements.
if ( '' !== $slash ) {
return "</$elem>";
}
return wp_kses_attr( $elem, $attrlist, $allowed_html, $allowed_protocols );
}