_wp_scrub_utf8_fallback()WP 6.9.0

Fallback mechanism for replacing invalid spans of UTF-8 bytes.

Example:

'Pi�a' === _wp_scrub_utf8_fallback( "Pi\xF1a" ); // “ñ” is 0xF1 in Windows-1252.

Internal function — this function is designed to be used by the kernel itself. It is not recommended to use this function in your code.

No Hooks.

Returns

String. Input string with spans of invalid bytes swapped with the replacement character.

Usage

_wp_scrub_utf8_fallback( $bytes ): string;
$bytes(string) (required)
UTF-8 encoded string which might contain spans of invalid bytes.

Notes

  • See: wp_scrub_utf8()

Changelog

Since 6.9.0 Introduced.

_wp_scrub_utf8_fallback() code WP 7.0

function _wp_scrub_utf8_fallback( string $bytes ): string {
	$bytes_length   = strlen( $bytes );
	$next_byte_at   = 0;
	$was_at         = 0;
	$invalid_length = 0;
	$scrubbed       = '';

	while ( $next_byte_at <= $bytes_length ) {
		_wp_scan_utf8( $bytes, $next_byte_at, $invalid_length );

		if ( $next_byte_at >= $bytes_length ) {
			if ( 0 === $was_at ) {
				return $bytes;
			}

			return $scrubbed . substr( $bytes, $was_at, $next_byte_at - $was_at - $invalid_length );
		}

		$scrubbed .= substr( $bytes, $was_at, $next_byte_at - $was_at );
		$scrubbed .= "\u{FFFD}";

		$next_byte_at += $invalid_length;
		$was_at        = $next_byte_at;
	}

	return $scrubbed;
}