WP_HTML_Decoder::code_point_to_utf8_bytes()public staticWP 6.6.0

Encode a code point number into the UTF-8 encoding.

This encoder implements the UTF-8 encoding algorithm for converting a code point into a byte sequence. If it receives an invalid code point it will return the Unicode Replacement Character U+FFFD .

Example:

'🅰' === WP_HTML_Decoder::code_point_to_utf8_bytes( 0x1f170 );
// Half of a surrogate pair is an invalid code point.
'�' === WP_HTML_Decoder::code_point_to_utf8_bytes( 0xd83c );

Method of the class: WP_HTML_Decoder{}

No Hooks.

Return

String. Converted code point, or if invalid.

Usage

$result = WP_HTML_Decoder::code_point_to_utf8_bytes( $code_point );
$code_point(int) (required)
Which code point to convert.

Notes

Changelog

Since 6.6.0 Introduced.

WP_HTML_Decoder::code_point_to_utf8_bytes() code WP 6.6.2

public static function code_point_to_utf8_bytes( $code_point ) {
	// Pre-check to ensure a valid code point.
	if (
		$code_point <= 0 ||
		( $code_point >= 0xD800 && $code_point <= 0xDFFF ) ||
		$code_point > 0x10FFFF
	) {
		return '�';
	}

	if ( $code_point <= 0x7F ) {
		return chr( $code_point );
	}

	if ( $code_point <= 0x7FF ) {
		$byte1 = ( $code_point >> 6 ) | 0xC0;
		$byte2 = $code_point & 0x3F | 0x80;

		return pack( 'CC', $byte1, $byte2 );
	}

	if ( $code_point <= 0xFFFF ) {
		$byte1 = ( $code_point >> 12 ) | 0xE0;
		$byte2 = ( $code_point >> 6 ) & 0x3F | 0x80;
		$byte3 = $code_point & 0x3F | 0x80;

		return pack( 'CCC', $byte1, $byte2, $byte3 );
	}

	// Any values above U+10FFFF are eliminated above in the pre-check.
	$byte1 = ( $code_point >> 18 ) | 0xF0;
	$byte2 = ( $code_point >> 12 ) & 0x3F | 0x80;
	$byte3 = ( $code_point >> 6 ) & 0x3F | 0x80;
	$byte4 = $code_point & 0x3F | 0x80;

	return pack( 'CCCC', $byte1, $byte2, $byte3, $byte4 );
}