WP_HTML_Tag_Processor::set_modifiable_textpublicWP 6.7.0

Sets the modifiable text for the matched token, if matched.

Modifiable text is text content that may be read and changed without changing the HTML structure of the document around it. This includes the contents of #text nodes in the HTML as well as the inner contents of HTML comments, Processing Instructions, and others, even though these nodes aren't part of a parsed DOM tree. They also contain the contents of SCRIPT and STYLE tags, of TEXTAREA tags, and of any other section in an HTML document which cannot contain HTML markup (DATA).

Not all modifiable text may be set by this method, and not all content may be set as modifiable text. In the case that this fails it will return false indicating as much. For instance, if the contents of a SCRIPT element are neither JavaScript nor JSON, it’s not possible to guarantee that escaping strings like </script> won’t break the script; in these cases, updates will be rejected and it’s up to calling code to perform language-specific escaping or workarounds. Similarly, it will not allow setting content into a comment which would prematurely terminate the comment.

Example:

// Add a preface to all STYLE contents.
while ( $processor->next_tag( 'STYLE' ) ) {
	$style = $processor->get_modifiable_text();
	$processor->set_modifiable_text( "// Made with love on the World Wide Web\n{$style}" );
}
// Replace smiley text with Emoji smilies.
while ( $processor->next_token() ) {
	if ( '#text' !== $processor->get_token_name() ) {
		continue;
	}

$chunk = $processor->get_modifiable_text(); if ( ! str_contains( $chunk, ':)' ) ) { continue; }

$processor->set_modifiable_text( str_replace( ':)', '🙂', $chunk ) );

}

This function handles all necessary HTML encoding. Provide normal, unescaped string values. The HTML API will encode the strings appropriately so that the browser will interpret them as the intended value.

Example:

// Renders as “Eggs & Milk” in a browser, encoded as `<p>Eggs & Milk</p>`.
$processor->set_modifiable_text( 'Eggs & Milk' );
// Renders as “Eggs & Milk” in a browser, encoded as `<p>Eggs &amp; Milk</p>`.
$processor->set_modifiable_text( 'Eggs & Milk' );

Method of the class: WP_HTML_Tag_Processor{}

No Hooks.

Returns

true|false. Whether the text was able to update.

Usage

$WP_HTML_Tag_Processor = new WP_HTML_Tag_Processor();
$WP_HTML_Tag_Processor->set_modifiable_text( $plaintext_content ): bool;
$plaintext_content(string) (required)
New text content to represent in the matched token.

Changelog

Since 6.7.0 Introduced.
Since 6.9.0 Escapes all character references instead of trying to avoid double-escaping.

WP_HTML_Tag_Processor::set_modifiable_text() code WP 7.0

public function set_modifiable_text( string $plaintext_content ): bool {
	if ( self::STATE_TEXT_NODE === $this->parser_state ) {
		$this->lexical_updates['modifiable text'] = new WP_HTML_Text_Replacement(
			$this->text_starts_at,
			$this->text_length,
			strtr(
				$plaintext_content,
				array(
					'<' => '&lt;',
					'>' => '&gt;',
					'&' => '&amp;',
					'"' => '&quot;',
					"'" => '&apos;',
				)
			)
		);

		return true;
	}

	// Comment data is not encoded.
	if (
		self::STATE_COMMENT === $this->parser_state &&
		self::COMMENT_AS_HTML_COMMENT === $this->comment_type
	) {
		// Check if the text could close the comment.
		if ( 1 === preg_match( '/--!?>/', $plaintext_content ) ) {
			return false;
		}

		$this->lexical_updates['modifiable text'] = new WP_HTML_Text_Replacement(
			$this->text_starts_at,
			$this->text_length,
			$plaintext_content
		);

		return true;
	}

	/*
	 * The rest of this function handles modifiable text for special "atomic" HTML elements.
	 * Only tags in the HTML namespace should be processed.
	 */
	if (
		self::STATE_MATCHED_TAG !== $this->parser_state ||
		'html' !== $this->get_namespace()
	) {
		return false;
	}

	switch ( $this->get_tag() ) {
		case 'SCRIPT':
			$script_content_type = $this->get_script_content_type();

			switch ( $script_content_type ) {
				case 'javascript':
				case 'json':
					$this->lexical_updates['modifiable text'] = new WP_HTML_Text_Replacement(
						$this->text_starts_at,
						$this->text_length,
						self::escape_javascript_script_contents( $plaintext_content )
					);
					return true;
			}

			/*
			 * If the script’s content type isn’t recognized and understandable then it’s
			 * impossible to guarantee that escaping the content won’t cause runtime breakage.
			 * For instance, if the script content type were PHP code then escaping with
			 * `\u0073` would not be met by unescaping; rather, it could result in corrupted
			 * data or even syntax errors.
			 *
			 * Because of this, content which could potentially modify the SCRIPT tag’s
			 * HTML structure is rejected here. It’s the responsibility of calling code to
			 * perform whatever semantic escaping is necessary to avoid problematic strings.
			 */
			if (
				false !== stripos( $plaintext_content, '<script' ) ||
				false !== stripos( $plaintext_content, '</script' )
			) {
				return false;
			}
			$this->lexical_updates['modifiable text'] = new WP_HTML_Text_Replacement(
				$this->text_starts_at,
				$this->text_length,
				$plaintext_content
			);
			return true;

		case 'STYLE':
			$plaintext_content = preg_replace_callback(
				'~</(?P<TAG_NAME>style)~i',
				static function ( $tag_match ) {
					return "\\3c\\2f{$tag_match['TAG_NAME']}";
				},
				$plaintext_content
			);

			$this->lexical_updates['modifiable text'] = new WP_HTML_Text_Replacement(
				$this->text_starts_at,
				$this->text_length,
				$plaintext_content
			);

			return true;

		case 'TEXTAREA':
		case 'TITLE':
			$plaintext_content = preg_replace_callback(
				"~</(?P<TAG_NAME>{$this->get_tag()})~i",
				static function ( $tag_match ) {
					return "&lt;/{$tag_match['TAG_NAME']}";
				},
				$plaintext_content
			);

			/*
			 * HTML ignores a single leading newline in this context. If a leading newline
			 * is intended, preserve it by adding an extra newline.
			 */
			if (
				'TEXTAREA' === $this->get_tag() &&
				1 === strspn( $plaintext_content, "\n\r", 0, 1 )
			) {
				$plaintext_content = "\n{$plaintext_content}";
			}

			/*
			 * These don't _need_ to be escaped, but since they are decoded it's
			 * safe to leave them escaped and this can prevent other code from
			 * naively detecting tags within the contents.
			 *
			 * @todo It would be useful to prefix a multiline replacement text
			 *       with a newline, but not necessary. This is for aesthetics.
			 */
			$this->lexical_updates['modifiable text'] = new WP_HTML_Text_Replacement(
				$this->text_starts_at,
				$this->text_length,
				$plaintext_content
			);

			return true;
	}

	return false;
}