wp_extract_urls()WP 3.7.0

Use RegEx to extract URLs from arbitrary content.

Used By: do_enclose()
1 time — 0.000114 sec (fast) | 50000 times — 0.19 sec (very fast) | PHP 7.0.8, WP 4.6.1

No Hooks.

Return

String[]. Array of URLs found in passed string.

Usage

wp_extract_urls( $content );
$content(string) (required)
Content to extract URLs from.

Examples

0

#1 Example of retrieving URLs from content

$content = 'Beginning of text with a link: http://wp-kama.com/
Continued, the link will now be in html <a href="http://wp-site.com/foo">link</a>.
And another option, but now the path will be a link to the picture:
<img alt=""" src="http://sitename.com/image.jpg">. That's it, that's enough for now.'

$urls = wp_extract_urls( $content );

/* $urls will contain such an array:
Array
(
	[0] => http://wp-kama.com/
	[1] => http://wp-site.com/foo
	[2] => http://sitename.com/image.jpg
)
*/
0

#2 Doesn't work for localhost URLs without a TLD: [auto-translate]

$content = '
<a href="http://localhost.com:8889/?p=9">hi</a> 
<a href="http://localhost:8889/?p=9">hi</a>     
';

$urls = wp_extract_urls( $content );

/* $urls will contain such an array:
Array
(
	[0] => http://localhost.com:8889/?p=9
)
*/

See this ticket.

Changelog

Since 3.7.0 Introduced.
Since 6.0.0 Fixes support for HTML entities (Trac 30580).

wp_extract_urls() code WP 6.4.3

function wp_extract_urls( $content ) {
	preg_match_all(
		"#([\"']?)("
			. '(?:([\w-]+:)?//?)'
			. '[^\s()<>]+'
			. '[.]'
			. '(?:'
				. '\([\w\d]+\)|'
				. '(?:'
					. "[^`!()\[\]{}:'\".,<>«»“”‘’\s]|"
					. '(?:[:]\d+)?/?'
				. ')+'
			. ')'
		. ")\\1#",
		$content,
		$post_links
	);

	$post_links = array_unique(
		array_map(
			static function ( $link ) {
				// Decode to replace valid entities, like &amp;.
				$link = html_entity_decode( $link );
				// Maintain backward compatibility by removing extraneous semi-colons (`;`).
				return str_replace( ';', '', $link );
			},
			$post_links[2]
		)
	);

	return array_values( $post_links );
}