_wp_utf8_codepoint_count()WP 6.9.0

Returns how many code points are found in the given UTF-8 string.

Invalid spans of bytes count as a single code point according to the maximal subpart rule. This function is a fallback method for calling mb_strlen($text,'UTF-8').

When negative values are provided for the byte offsets or length, this will always report zero code points.

Example:

4  === _wp_utf8_codepoint_count( 'text' );
// Groups are 'test', "\x90" as '�', 'wp', "\xE2\x80" as '�', "\xC0" as '�', and 'test'.
13 === _wp_utf8_codepoint_count( "test\x90wp\xE2\x80\xC0test" );

Internal function — this function is designed to be used by the kernel itself. It is not recommended to use this function in your code.

No Hooks.

Returns

Int. How many code points were found.

Usage

_wp_utf8_codepoint_count( $text, ?int $byte_offset, ?int $max_byte_length ): int;
$text(string) (required)
Count code points in this string.
?int $byte_offset
.
?int $max_byte_length
.
Default: PHP_INT_MAX

Changelog

Since 6.9.0 Introduced.

_wp_utf8_codepoint_count() code WP 7.0

function _wp_utf8_codepoint_count( string $text, ?int $byte_offset = 0, ?int $max_byte_length = PHP_INT_MAX ): int {
	if ( $byte_offset < 0 ) {
		return 0;
	}

	$count           = 0;
	$at              = $byte_offset;
	$end             = strlen( $text );
	$invalid_length  = 0;
	$max_byte_length = min( $end - $at, $max_byte_length );

	while ( $at < $end && ( $at - $byte_offset ) < $max_byte_length ) {
		$count += _wp_scan_utf8( $text, $at, $invalid_length, $max_byte_length - ( $at - $byte_offset ) );
		$count += $invalid_length > 0 ? 1 : 0;
		$at    += $invalid_length;
	}

	return $count;
}