WP_HTML_Tag_Processor::subdivide_text_appropriately │ public │ WP 6.7.0

Subdivides a matched text node, splitting NULL byte sequences and decoded whitespace as distinct nodes prefixes.

Note that once anything that's neither a NULL byte nor decoded whitespace is encountered, then the remainder of the text node is left intact as generic text.

The HTML Processor uses this to apply distinct rules for different kinds of text.
Inter-element whitespace can be detected and skipped with this method.

Text nodes aren't eagerly subdivided because there's no need to split them unless decisions are being made on NULL byte sequences or whitespace-only text.

Example:

$processor = new WP_HTML_Tag_Processor( "\x00Apples & Oranges" );
true  === $processor->next_token();                   // Text is "Apples & Oranges".
true  === $processor->subdivide_text_appropriately(); // Text is "".
true  === $processor->next_token();                   // Text is "Apples & Oranges".
false === $processor->subdivide_text_appropriately();

$processor = new WP_HTML_Tag_Processor( " \r\n\tMore" );
true  === $processor->next_token();                   // Text is "␤ ␤␉More".
true  === $processor->subdivide_text_appropriately(); // Text is "␤ ␤␉".
true  === $processor->next_token();                   // Text is "More".
false === $processor->subdivide_text_appropriately();

Method of the class: WP_HTML_Tag_Processor{}

No Hooks.

Returns

true|false. Whether the text node was subdivided.

Usage

$WP_HTML_Tag_Processor = new WP_HTML_Tag_Processor();
$WP_HTML_Tag_Processor->subdivide_text_appropriately(): bool;

Changelog

Since 6.7.0

Introduced.

`WP_HTML_Tag_Processor::subdivide_text_appropriately() WP HTML Tag Processor::subdivide text appropriately` code ^{WP 7.0}

wp-includes/html-api/class-wp-html-tag-processor.php

public function subdivide_text_appropriately(): bool {
	if ( self::STATE_TEXT_NODE !== $this->parser_state ) {
		return false;
	}

	$this->text_node_classification = self::TEXT_IS_GENERIC;

	/*
	 * NULL bytes are treated categorically different than numeric character
	 * references whose number is zero. `&#x00;` is not the same as `"\x00"`.
	 */
	$leading_nulls = strspn( $this->html, "\x00", $this->text_starts_at, $this->text_length );
	if ( $leading_nulls > 0 ) {
		$this->token_length             = $leading_nulls;
		$this->text_length              = $leading_nulls;
		$this->bytes_already_parsed     = $this->token_starts_at + $leading_nulls;
		$this->text_node_classification = self::TEXT_IS_NULL_SEQUENCE;
		return true;
	}

	/*
	 * Start a decoding loop to determine the point at which the
	 * text subdivides. This entails raw whitespace bytes and any
	 * character reference that decodes to the same.
	 */
	$at  = $this->text_starts_at;
	$end = $this->text_starts_at + $this->text_length;
	while ( $at < $end ) {
		$skipped = strspn( $this->html, " \t\f\r\n", $at, $end - $at );
		$at     += $skipped;

		if ( $at < $end && '&' === $this->html[ $at ] ) {
			$matched_byte_length = null;
			$replacement         = WP_HTML_Decoder::read_character_reference( 'data', $this->html, $at, $matched_byte_length );
			if ( isset( $replacement ) && 1 === strspn( $replacement, " \t\f\r\n" ) ) {
				$at += $matched_byte_length;
				continue;
			}
		}

		break;
	}

	if ( $at > $this->text_starts_at ) {
		$new_length                     = $at - $this->text_starts_at;
		$this->text_length              = $new_length;
		$this->token_length             = $new_length;
		$this->bytes_already_parsed     = $at;
		$this->text_node_classification = self::TEXT_IS_WHITESPACE;
		return true;
	}

	return false;
}

WP_HTML_Tag_Processor::subdivide_text_appropriately │ public │ WP 6.7.0

Returns

Usage

Changelog

WP_HTML_Tag_Processor::subdivide_text_appropriately() WP HTML Tag Processor::subdivide text appropriately code WP 7.0

`WP_HTML_Tag_Processor::subdivide_text_appropriately() WP HTML Tag Processor::subdivide text appropriately` code ^{WP 7.0}