_mb_substr()WP 3.2.0

Internal compat function to mimic mb_substr().

Only supports UTF-8 and non-shifting single-byte encodings. For all other encodings expect the substrings to be misaligned. When the given encoding (or the blog_charset if none is provided) isn’t UTF-8 then the function returns the output of {@see \substr()}.

Internal function — this function is designed to be used by the kernel itself. It is not recommended to use this function in your code.

No Hooks.

Returns

String. Extracted substring.

Usage

_mb_substr( $str, $start, $length, $encoding );
$str(string) (required)
The string to extract the substring from.
$start(int) (required)
Character offset at which to start the substring extraction.
$length(int|null)
Maximum number of characters to extract from $str.
Default: null
$encoding(string|null)
Character encoding to use.
Default: null

Changelog

Since 3.2.0 Introduced.

_mb_substr() code WP 7.0

function _mb_substr( $str, $start, $length = null, $encoding = null ) {
	if ( null === $str ) {
		return '';
	}

	// The solution below works only for UTF-8; treat all other encodings as byte streams.
	if ( ! _is_utf8_charset( $encoding ?? get_option( 'blog_charset' ) ) ) {
		return is_null( $length ) ? substr( $str, $start ) : substr( $str, $start, $length );
	}

	$total_length = ( $start < 0 || $length < 0 )
		? _wp_utf8_codepoint_count( $str )
		: 0;

	$normalized_start = $start < 0
		? max( 0, $total_length + $start )
		: $start;

	/*
	 * The starting offset is provided as characters, which means this needs to
	 * find how many bytes that many characters occupies at the start of the string.
	 */
	$starting_byte_offset = _wp_utf8_codepoint_span( $str, 0, $normalized_start );

	$normalized_length = $length < 0
		? max( 0, $total_length - $normalized_start + $length )
		: $length;

	/*
	 * This is the main step. It finds how many bytes the given length of code points
	 * occupies in the input, starting at the byte offset calculated above.
	 */
	$byte_length = isset( $normalized_length )
		? _wp_utf8_codepoint_span( $str, $starting_byte_offset, $normalized_length )
		: ( strlen( $str ) - $starting_byte_offset );

	// The result is a normal byte-level substring using the computed ranges.
	return substr( $str, $starting_byte_offset, $byte_length );
}